scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Feature Detection with Automatic Scale Selection

01 Nov 1998-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 30, Iss: 2, pp 79-116
TL;DR: It is shown how the proposed methodology applies to the problems of blob detection, junction detection, edge detection, ridge detection and local frequency estimation and how it can be used as a major mechanism in algorithms for automatic scale selection, which adapt the local scales of processing to the local image structure.
Abstract: The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works, Witkin (1983) and Koenderink (1984) proposed to approach this problem by representing image structures at different scales in a so-called scale-space representation. Traditional scale-space theory building on this work, however, does not address the problem of how to select local appropriate scales for further analysis. This article proposes a systematic methodology for dealing with this problem. A framework is presented for generating hypotheses about interesting scale levels in image data, based on a general principle stating that local extrema over scales of different combinations of γ-normalized derivatives are likely candidates to correspond to interesting structures. Specifically, it is shown how this idea can be used as a major mechanism in algorithms for automatic scale selection, which adapt the local scales of processing to the local image structure. Support for the proposed approach is given in terms of a general theoretical investigation of the behaviour of the scale selection method under rescalings of the input pattern and by integration with different types of early visual modules, including experiments on real-world and synthetic data. Support is also given by a detailed analysis of how different types of feature detectors perform when integrated with a scale selection mechanism and then applied to characteristic model patterns. Specifically, it is described in detail how the proposed methodology applies to the problems of blob detection, junction detection, edge detection, ridge detection and local frequency estimation. In many computer vision applications, the poor performance of the low-level vision modules constitutes a major bottleneck. It is argued that the inclusion of mechanisms for automatic scale selection is essential if we are to construct vision systems to automatically analyse complex unknown environments.

Summary (9 min read)

Jump to: [1 Introduction][1.1 Outline of the presentation][2 Scale-space representation: Review][3 Normalized derivatives and intuitive idea for scale selection][4 Proposed methodology for scale selection][4.1 General scaling property of local maxima over scales][4.2 The scale selection mechanism in practice][4.3 Experiments: Scale-space signatures from real data][4.4 Simultaneous detection of interesting points and scales][5 Blob detection with automatic scale selection][5.1 Analysis of scale-space maxima for idealized model patterns][5.2 Comparisons with fixed-scale blob detection][5.3 Applications of blob detection with automatic scale selection][6.1 Selection of detection scales from normalized scale-space maxima][6.2 Analysis of scale-space maxima for diffuse junction models][6.3 Experiments: Scale-space signatures in junction detection][7 Feature localization with automatic scale selection][7.1 Corner localization by local consistency][7.2 Automatic selection of localization scales][7.3 Experiments: Choice of localization scale][7.4 Composed scheme for junction detection and localization][7.5 Further experiments][200 strongest junctions 200 strongest junctions][7.6 Applications of corner detection with automatic scale selection][7.7 Extensions of the junction detection method][7.8 Extensions to edge detection][8 Dense frequency estimation][9.3 Relations to previous work][10 Summary and discussion] and [10.1 Technical contributions]

1 Introduction

  • One of the very fundamental problems that arises when analysing real-world measurement data originates from the fact that objects in the world may appear in different ways depending upon the scale of observation.
  • Notably, the type of physical description that is obtained may be strongly dependent on the scale at which the world is modelled, and this is in clear contrast to certain idealized mathematical entities, such as “point” or “line”, which are independent of the scale of observation.
  • Under other circumstances, however, it may not be obvious at all to determine in advance what are the proper scales.
  • A main intention behind this construction is to obtain a separation of the image structures in the original image, such that fine scale image structures only exist at the finest scales in the multi-scale representation.
  • The subject of this article is to address the problem of automatic scale selection in a more general setting, for wider classes of image descriptors.

1.1 Outline of the presentation

  • The presentation is organized as follows: Section 2 reviews the main concepts from scale-space theory the authors build upon.
  • Section 3 introduces the notion of normalized derivatives and illustrates how maxima over scales of normalized Gaussian derivatives reflect the frequency content in sine wave patterns.
  • This material serves as a preparation for section 4, which presents the proposed scale selection methodology and shows how it applies generally to a large class of differential descriptors.
  • Section 8 shows an example of how this approach applies to the computation of dense feature maps.
  • In a complementary paper (Lindeberg 1996a) it is developed in detail how this approach applies to edge detection and ridge detection.

2 Scale-space representation: Review

  • There are several mathematical results (Koenderink 1984; Babaud et al.
  • Interestingly, the results from these theoretical considerations are in qualitative agreement with the results of biological evolution.
  • Neurophysiological studies by (Young 1985, 1987) have shown that there are receptive fields in the mammalian retina and visual cortex, whose measured response profiles can be well modelled by Gaussian derivatives up to order four.

3 Normalized derivatives and intuitive idea for scale selection

  • This is a direct consequence of the non-enhancement property of local extrema, which states that the value at a local maximum cannot increase, and the value at a local minimum cannot decrease.
  • In practice, it means that the amplitude of the variations in a signal will always decrease with scale.
  • In the special case when γ = 1, these ξ-coordinates and their associated normalized derivative operator are dimensionless.
  • As the authors shall see later, however, values of γ < 1 will be highly useful when formulating scale selection mechanisms for edge detection and ridge detection.
  • For the sinusoidal signal, the amplitude of an mth order normalized derivative as function of scale is then given by Lξm,max(t) = tmγ/2 ωm0 e −ω20t/2, (11) i.e., it first increases and then decreases.

4 Proposed methodology for scale selection

  • The example above shows that the scale at which a normalized derivative assumes its maximum over scales is for a sinusoidal signal proportional to the wavelength of the signal.
  • Maxima over scales of normalized derivatives reflect the scales over which spatial variations take place in the signal.
  • Yhis operation corresponds to an interesting computational structure, since it constitutes a way of estimating length based on local measurements performed at only a single spatial point in the scale-space representation, and without explicitly laying out a ruler.
  • Moreover, compared to a local windowed Fourier transform there is no need for making any explicit settings of window size for computing the Fourier transform.
  • This principle is closely related to although not equivalent to the method for scale selection in previously proposed in (Lindeberg 1991, 1993a), where interesting scale levels were determined from maxima over scales of a normalized blob measure.

4.1 General scaling property of local maxima over scales

  • A basic justification for the abovementioned arguments can be obtained from the fact that for a large class of (possibly non-linear) combinations of normalized derivatives it holds that maxima over scales have a nice behaviour under rescalings of the intensity pattern.
  • Let us hence restrict the analysis to polynomial differential invariants which are homogeneous in the sense that the sum of the orders of differentiation is the same for each term in the polynomial.
  • For a differential expression of this form, the corresponding normalized differential expression in each domain is given by Dγ−normL = tMγ/2DL, (26) D′γ−normL′ = t′Mγ/2D′L′. (27) From (23) it follows that these normalized differential expressions are related by Dγ−normL = sM(1−γ)D′γ−normL′. (28) Clearly, by γ-normalization with γ = 1, the magnitude of the derivative is not scale invariant.
  • Hence, even when γ 6= 1, the authors can achieve sufficient scale invariance to support the proposed scale selection methodology.
  • From the the transformation property (23), it is, however, apparent that this magnitude measure will be strongly dependent on the scale at which the maximum over scales is assumed.

4.2 The scale selection mechanism in practice

  • So far the authors have proposed a general methodology for scale selection by detecting local maxima in feature responses over scales.
  • Here, the authors shall not attempt to answer this question.
  • Let us instead contend that the differential expression should at least be determined so as to capture the types of image structures under consideration.
  • The general approach to scale selection that will be proposed is to use these maximal responses over scales in the stage of detecting image features, i.e., when establishing the existence of different types of image structures.
  • The suggested framework naturally gives rise to two-stage algorithms, with feature detection at coarse scales followed by feature localization at finer scales.

4.3 Experiments: Scale-space signatures from real data

  • (To avoid the sensitivity to sign of these entities, and hence the polarity of the signal, traceHnormL and detHnormL have been squared before presentation.).
  • These graphs are called the scale-space signatures of 2 and 2, respectively.
  • This example illustrates that results in agreement with the proposed scale selection principle can be obtained also for real-world data (and for signals having a much richer frequency content than a single sine wave).
  • The reason why these particular differential expressions have been selected here is because they constitute differential entities useful for blob detection; see e.g.

4.4 Simultaneous detection of interesting points and scales

  • In figure 2, the signatures of the normalized differential entities were computed at the central point in each image.
  • These points were deliberately chosen to coincide 2 with the centers of the sunflowers, where the blob response can be expected to be maximal under spatial perturbations.
  • Specific examples of this idea will be worked out in more detail in the following sections.
  • Referring to the invariance properties of local maxima over scales under rescalings of the input signal, the authors can observe that they transfer trivially to scale-space maxima.

5 Blob detection with automatic scale selection

  • Every scale-space maximum has been graphically illustrated by a circle centered at the point at which the spatial maximum is assumed, and with the size determined such that the radius (measured in pixel units) is proportional to the scale at which the maximum is assumed (measured in dimension length).
  • To reduce the number of blobs, a threshold on the maximum normalized response has been selected such that the 250 blobs having the maximum normalized responses according to (30) remain.
  • The bottom row shows the result of superimposing these circles onto a bright copy of the original image, as well as corresponding results for the normalized scalespace extrema of the square of the determinant of the Hessian matrix.
  • Corresponding experiments for a synthetic pattern (analysed in section 5.1) are given in figure 4.
  • Observe how these conceptually very simple differential geometric descriptors give a very reasonable description of the blob-like structures in the image (in particular concerning the blob size) considering how little information is used in the processing.

5.1 Analysis of scale-space maxima for idealized model patterns

  • Whereas the theoretical analysis in section 4.1 applies generally to large classes of differential invariants and input signals, one may ask how the scale selection method for blob detection performs in specific situations.
  • The authors shall study two 3When detecting scale-space maxima in practice, there is, of course, no need to explicitly track the extrema along the extremum path in scale-space.
  • Model patterns for which a closed-form solution of diffusion equation can be calculated and a complete analytical study hence is feasible.
  • There is a unique solution when the ratio ω2/ω1 is close to one, and three solutions when the ratio is sufficiently large.

5.2 Comparisons with fixed-scale blob detection

  • Figure 6 shows the result of computing spatial maxima at different scales in the response of the Laplacian operator from the sine wave pattern in figure 4.
  • At each scale, the 50 strongest responses have been extracted.
  • As can be seen, small blobs are given the highest relative ranking at fine scales, whereas large blobs are given the highest relative ranking at coarse scales.
  • Hence, a blob detector of this type (operating at a single predetermined scale) induces a bias towards image structures of a certain size.
  • (As was shown above, the associated measure of blob strength is strictly scale invariant.).

5.3 Applications of blob detection with automatic scale selection

  • Following the previously presented arguments, the authors argue that a scale selection mechanism is an essential complement to any blob detector aimed at handling large size variations in the image structures.
  • In addition, scale information associated with such adaptively computed image descriptors may serve as an important cue in its own right.
  • In (Bretzner and Lindeberg 1996, 1998) an application to feature tracking is presented, where (i) the scale information constitutes a key component in the criterion for matching image features over time, and (ii) the scale selection mechanism is essential for the vision system to be able to capture objects under large size variations over time.

6.1 Selection of detection scales from normalized scale-space maxima

  • A commonly used entity for junction detection is the curvature of level curves in intensity data multiplied by the gradient magnitude (Kitchen and Rosenfeld 1982; Dreschler and Nagel 1982; Koenderink and Richards 1988; Noble 1988; Deriche and Giraudon 1990; Blom 1992; Florack et al. 1992; Lindeberg 1994d).
  • To reduce the number of junction candidates, the scale-space maxima have been sorted with respect to a saliency measure.
  • Finally, the 50 most significant blobs according to this ranking have been displayed.
  • Of course, thresholding on the magnitude of the operator response constitutes a coarse selective mechanism for feature detection.
  • Nevertheless, note that this operation gives rise to a set of junction candidates with reasonable interpretations in the scene.

6.2 Analysis of scale-space maxima for diffuse junction models

  • To obtain an intuitive understanding of the qualitative behaviour of the scale selection method in this case, let us analyse a simple junction model for which a closed-form analysis can be carried out without too much effort.
  • Unfortunately, the equation that determines the position of the spatial maximum in κ̃2 over scales is non-trivial to handle (it contains a non-linear combination of the Gaussian function, the primitive function of the Gaussian, and polynomials).
  • This function can be regarded as a coarse model of the behaviour at so coarse scales in scale-space that the shape distortions are substantial and the overall shape of a finite-size object is severely affected.
  • Hence, selecting scale levels (and spatial points) where κ̃2norm assumes maxima over scales can be expected to give rise to scale levels in the intermediate scale range (where a finite extent junction model constitutes a reasonable approximation).

6.3 Experiments: Scale-space signatures in junction detection

  • Figure 9 illustrates these effects for synthetic L-junctions with varying degrees of diffuseness.
  • In other words, the scale at which the maximum over scales is assumed indicates the spatial extent (the size) of the region for which a junction model is consistent with the grey-level data (in agreement with the suggested scale selection principle).
  • It shows scale-space maxima of κ̃2norm computed from a synthetic image containing corner structures at different scales.
  • The original greylevel image is shown in the ground plane, and each scale-space maximum has been graphically visualized by a sphere centered at the position (x0; t0) in scale-space at which the scale-space maximum was assumed.
  • More results on corner detection, including a complementary mechanism for accurate corner localization, are presented in section 7.

7 Feature localization with automatic scale selection

  • The scale selection methodology presented so far applies to the detection of image features, and the role of the scale selection mechanism is to estimate the approximate size of the image structures the feature detector responds to.
  • Whereas this approach provides a conceptually simple way to express various feature detectors, such as a junction detector, which automatically adapts its scale levels to the local image structure, it is not guaranteed that the spatial positions of the scale-space maxima constitute accurate estimates of the corner locations.
  • The local maxima over scales may be assumed at rather coarse scales, where the drift due to scale-space smoothing is substantial and adjacent features may interfere with each other.
  • For this reason, it is natural to complement the initial feature detection step by an explicit feature localization stage.
  • The subject of this section is show how mechanism for automatic scale selection can be formulated in this context, by minimizing normalized measures of inconsistency over scales.

7.1 Corner localization by local consistency

  • Minimizing this expression corresponds to finding the point x that minimizes the weighted integral of the squares of the distances from x to all lx′ in the neighbourhood, see figure 12.
  • (Dx′(x) is distance from x to lx′ multiplied by the gradient magnitude, and the window function implies that stronger weights are given to points in a neighbourhood of x0.).
  • The overall intention of this formulation is that for an image pattern containing a junction, the point x that minimizes (57) should constitute a better estimate of the projection of the physical junction than x0.
  • Explicit solution in terms of local image statistics.

7.2 Automatic selection of localization scales

  • The formulation in previous section however, leaves two major problems open: Moreover, let the scale value of this window function be proportional to the detection scale t0 at which the maximum over scales in κ̃2norm was assumed.
  • Specifically, scale selection according by minimizing the normalized residual r̃ (65) over scales, corresponds to selecting the scale that minimizes the estimated inaccuracy in the localization estimate.
  • Thus, when smoothing is necessary, the residual will decrease.
  • This can be easily understood by observing that for an ideal polygon-type junction (consisting of regions of uniform grey-level delimited by straight edges), all edge tangents meet at the junction point, which means that the residual d̃min is exactly zero.

7.3 Experiments: Choice of localization scale

  • Figure 13 and figure 14 show the result of applying this scale selection mechanism to a sharp and a diffuse corner with different amounts of added white Gaussian noise.
  • As can be seen, the results agree with a the qualitative discussion above.
  • For each noise level, this table gives the scale at which the normalized residual assumes its minimum over scales, as well as the scale at which the estimate with the minimum absolute error over scales is obtained.
  • The results show that the normalized residual serves as an estimate of the inaccuracy in the corner localization estimate, and specifically that the scale at which the minimum over scales in d̃min is assumed is a reasonable estimate of the scale at which the authors have the localization estimate with the minimum absolute error.
  • Figure 15 shows the result of applying the composed junction localization stage to the junction candidates in figure 8.

7.4 Composed scheme for junction detection and localization

  • To summarize, the composed two-stage scheme for junction detection and junction localization consists of the following processing steps:5 1. Detection.
  • Detect scale-space maxima in the square of the normalized rescaled level curve curvature κ̃norm = t2γ κ̃ = t2γ (L2x2Lx1x1 − 2Lx1Lx2Lx1x2 + L2x1Lx2x2) (or some other suitable normalized differential entity).
  • This generates a set of junction candidates.
  • 5Besides the general descriptions given in previous sections.

7.5 Further experiments

  • Concerning the number of junction candidates to be processed and passed on to later processing stages, the authors have not made any attempts in this work to decide automatically how many of the extracted junction candidates correspond to physical junctions in the world.
  • The authors argue that such decisions require integration with higherlevel reasoning and verification processes, and may be extremely hard to make at the earliest processing stages unless additional information is available about the external conditions.
  • For this reason, this module only aims at computing an early ranking of image features in order of significance, which can be used by a vision system for processing features in decreasing order of significance.
  • 6An integrated vision system for analysing junctions by actively zooming in to interesting structures is presented in (Brunnström et al. 1992; Lindeberg 1993a).

200 strongest junctions 200 strongest junctions

  • In line with this idea, the results are shown in terms of the N strongest junction candidates for different (manually chosen) values of N .
  • In figure 17, which shows corresponding examples for more cluttered scenes, the number of junctions displayed has been increased to 100 and 200.
  • Notably, this number of junction candidates constitutes the only essential tuning parameter of the composed algorithm.
  • Here, the 10 most significant junctions have been processed.
  • The table in figure 19 shows numerical values exemplifying how large the localization errors can be in the different processing stages.

7.6 Applications of corner detection with automatic scale selection

  • In (Lindeberg and Li 1995, 1997) it is shown how the support region associated with each junction allows for conceptually simple matching between junctions and edges based on spatial overlap only and without any need for providing externally determined thresholds on e.g. distance.
  • Then, the matching relations between edges and junction cues that arise in this way are used in a pre-processing stage for classifying edges into straight and curved.
  • In (Bretzner and Lindeberg 1996) it is demonstrated how these support regions can be used for simplifying matching of junctions over time in tracking algorithms.
  • Specifically, it is shown that the scale selection mechanism in the junction detector is essential to capture junctions that undergo large size changes.
  • In (Lindeberg 1995a, 1996d) a scale selection principle for stereo matching and flow estimation is presented, which also involves the extension of a fixed scale least squares estimation problem to optimization over multiple scales.

7.7 Extensions of the junction detection method

  • The main purpose of the presentation in this section has been to make explicit how a scale selection mechanism can be incorporated into a junction detector.
  • When building a stand-alone junction detector, there are a few additional mechanisms which are natural to include if the aim is to construct a stand-alone junction detector.
  • Concerning the ranking on significance, the authors can conceive linking the maxima of the junction responses across scales in a similar way as done in the scale-space primal sketch (Lindeberg 1993a), register scale-space events such as bifurcations, and include the scale-space lifetime of each junction response into the significance measure.
  • Concerning the region of interest associated with each junction candidate, the authors have throughout this work represented the support region of a scale-space maximum by a circle with area reflecting the detection scale.
  • A possible limitation of this approach is that nearby junctions may lead to interference effects in operations such as the localization stage.

7.8 Extensions to edge detection

  • Concerning more general applications of the proposed methodology, it should be noted that the scale selection method for junction localization applies to edge detection as well.
  • The columns show from left to right; (i) the local grey-level pattern, (ii) the signature of d̃min computed at the central point, and (iii) edges detected at the scale td = argmin d̃min at which the minimum over scales in d̃min was assumed.
  • In the first row, the authors can see that when performing edge detection at argmin d̃min they obtain coherent edge descriptor corresponding to the dominant edge structure in this region.
  • In the second row, a large amount of white Gaussian noise has been added to the grey-level image, and the minimum over scales is assumed at a much coarser scale.
  • Concerning these experiments it should be pointed out that they are mainly intended to demonstrate the potential in applying the proposed method for selecting localization scales to the problem of edge detection, and that further processing steps are needed to give a complete algorithm.

8 Dense frequency estimation

  • So far, the authors have seen how the scale selection methodology can be applied to the detection of sparse feature points.
  • An obvious problem that arises if the authors would base a scale selection mechanism for computing dense image descriptors on a partial derivative of the intensity function, such as the Laplacian operator is that there would be large spatial variations in the operator response.
  • A common methodology in signal processing for reducing this so-called phase dependency is by using quadrature filter pairs defined (from a Hilbert transform) in such a way that the Euclidean sum of the filter responses will be constant for any sine wave.
  • (As will be shown below, this scale value is of the same order of magnitude as the scales that maximize QL over scales; compare also with section 3.).
  • In the abovementioned sources, these specific entities and normalization parameters are shown to be useful for edge detection and ridge detection with automatic scale selection.

9.3 Relations to previous work

  • Such L1-normalized kernels of first order have been used, for example, in edge detection and edge classification by (Korn 1988), (Mallat and Zhong 1992), and (Zhang and Bergholm 1993), and in pyramids by (Crowley and Parker 1984).
  • More generally, evolution properties across scales of wavelet transforms have been used by (Mallat and Hwang 1992) for characterizing local Lipshitz exponents of singularities.
  • There is also a connection to the “top point” representation proposed by (Johansen et al. 1986) in the sense that the points in the scale-space at which bifurcations occur serve as to delimit extremum paths with different topology.
  • A main difference between the scale selection mechanism suggested here and the work in (Lindeberg 1991) and (Mallat and Hwang 1992), however, is that here it is shown how these notions can be applied to large classes of non-linear differential invariants computed in a scale-space representation.
  • Moreover, feature detection algorithms have been formulated with integrated scale selection mechanisms and it has been shown how different derivative normalization approaches lead to different classes of differential expressions for which the scale selection mechanism commutes with rescalings of the input pattern.

10 Summary and discussion

  • The authors have argued that the subject of scale selection is essential to many problems in computer vision and automated image analysis.
  • A general scale selection principle has been presented stating that in the absence of other evidence, coarse estimates of the size of image structures can be computed from the scales at which normalized differential geometric descriptors assume maxima over scales.
  • Adapted coarse scales, and then localized to finer scales in a second stage processing stage.
  • Whereas the general advantages of such a two-stage approach to feature detection are well-known in the literature, a major contribution here is that explicit mechanisms are provided for automatic selection of the detection scales as well as the localization scales.
  • Moreover, these processing stages are integrated into algorithms which are essentially free from other tuning parameters that the number of features of interest.

10.1 Technical contributions

  • At a technically more detailed level some of the main contributions are that: .
  • It is emphasized how the evolution properties over scales of normalized scalespace derivatives differ from those of traditional spatial derivatives.
  • A general scale selection principle for scale selection has been proposed stating that extrema over scales in the signature of normalized differential entities are useful in the stage of detecting image features.
  • The problem of junction detection is treated more extensively, and the resulting method is the first junction detection algorithm with automatic scale selection.
  • Specifically, it is shown how localization scales can be selected automatically by minimizing a certain normalized residual across scales.

Did you find this useful? Give us your feedback

Figures (26)

Content maybe subject to copyright    Report

Feature Detection with Automatic Scale Selection
Tony Lindeberg
Computational Vision and Active Perception Laboratory (CVAP)
Department of Numerical Analysis and Computing Science
KTH (Royal Institute of Technology)
S-100 44 Stockholm, Sweden.
http://www.nada.kth.se/˜tony
Email: tony@nada.kth.se
Technical report ISRN KTH/NA/P–96/18–SE, May 1996, Revised August 1998.
Int. J. of Computer Vision, vol 30, number 2, 1998. (In press).
Abstract
The fact that objects in the world appear in different ways depending on the
scale of observation has important implications if one aims at describing them. It
shows that the notion of scale is of utmost importance when processing unknown
measurement data by automatic methods. In their seminal works, Witkin (1983)
and Koenderink (1984) proposed to approach this problem by representing image
structures at different scales in a so-called scale-space representation. Traditional
scale-space theory building on this work, however, does not address the problem
of how to select local appropriate scales for further analysis.
This article proposes a systematic methodology for dealing with this problem.
A framework is proposed for generating hypotheses about interesting scale levels
in image data, based on a general principle stating that local extrema over scales
of different combinations of γ-normalized derivatives are likely candidates to
correspond to interesting structures. Specifically, it is shown how this idea can
be used as a major mechanism in algorithms for automatic scale selection, which
adapt the local scales of processing to the local image structure.
Support for the proposed approach is given in terms of a general theoretical
investigation of the behaviour of the scale selection method under rescalings of
the input pattern and by experiments on real-world and synthetic data. Support
is also given by a detailed analysis of how different types of feature detectors
perform when integrated with a scale selection mechanism and then applied to
characteristic model patterns. Specifically, it is described in detail how the pro-
posed methodology applies to the problems of blob detection, junction detection,
edge detection, ridge detection and local frequency estimation.
In many computer vision applications, the poor performance of the low-level
vision modules constitutes a major bottle-neck. It will be argued that the inclu-
sion of mechanisms for automatic scale selection is essential if we are to construct
vision systems to analyse complex unknown environments.
Keywords: scale, scale-space, scale selection, normalized derivative, feature detec-
tion, blob detection, corner detection, frequency estimation, Gaussian derivative,
scale-space, multi-scale representation, computer vision
This work was partially performed under the ESPRIT-BRA project INSIGHT and the ESPRIT-
NSF collaboration DIFFUSION. The support from the Swedish Research Council for Engineering
Sciences, TFR, is gratefully acknowledged. The three-dimensional illustrations in figure 5 and fig-
ure 11 have been generated with the kind assistance of Pascal Grostabussiat.
i

ii Lindeberg
Contents
1 Introduction 1
1.1 Outlineofthepresentation ........................ 2
2 Scale-space representation: Review 3
3 Normalized derivatives and intuitive idea for scale selection 3
4 Proposed methodology for scale selection 5
4.1 Generalscalingpropertyoflocalmaximaoverscales .......... 6
4.2 Thescaleselectionmechanisminpractice ................ 8
4.3 Experiments:Scale-spacesignaturesfromrealdata........... 9
4.4 Simultaneousdetectionofinterestingpointsandscales......... 9
5 Blob detection with automatic scale selection 11
5.1 Analysisofscale-spacemaximaforidealizedmodelpatterns...... 11
5.2 Comparisonswithxed-scaleblobdetection............... 14
5.3 Applicationsofblobdetectionwithautomaticscaleselection ..... 15
6 Junction detection with automatic scale selection 15
6.1 Selection of detection scales from normalized scale-space maxima . . . 16
6.2 Analysisofscale-spacemaximafordiusejunctionmodels....... 18
6.3 Experiments:Scale-spacesignaturesinjunctiondetection ....... 19
7 Feature lo calization with automatic scale selection 21
7.1 Cornerlocalizationbylocalconsistency ................. 21
7.2 Automaticselectionoflocalizationscales................. 23
7.3 Experiments:Choiceoflocalizationscale................. 25
7.4 Composedschemeforjunctiondetectionandlocalization ....... 27
7.5 Furtherexperiments ............................ 28
7.6 Applications of corner detection with automatic scale selection . . . . 32
7.7 Extensionsofthejunctiondetectionmethod............... 32
7.8 Extensionstoedgedetection ....................... 33
8 Dense frequency estimation 34
9 Analysis and interpretation of normalized derivatives 38
9.1 Interpretation of γ-normalized derivatives in terms of L
p
-norms.... 38
9.2 Interpretationintermsofself-similarFourierspectrum......... 38
9.3 Relationstopreviouswork......................... 40
10 Summary and discussion 40
10.1Technicalcontributions .......................... 41
A Appendix 42
A.1 Necessity of the form of the γ-parameterized derivative operator . . . 42
A.2 L
p
-normalization interpretation of γ-normalizedderivatives ...... 44
A.3 Normalizedderivativeresponsestoself-similarpowerspectra ..... 44
A.4 Discreteimplementationofthescaleselectionmechanisms....... 45

Feature detection with automatic scale selection 1
1 Introduction
One of the very fundamental problems that arises when analysing real-world mea-
surement data originates from the fact that objects in the world may appear in
different ways depending upon the scale of observation. This fact is well-known in
physics, where phenomena are modelled at several levels of scale, ranging from parti-
cle physics and quantum mechanics at fine scales, through thermodynamics and solid
mechanics dealing with every-day phenomena, to astronomy and relativity theory at
scales much larger than those we are usually dealing with. Notably, the type of phys-
ical description that is obtained may be strongly dependent on the scale at which
the world is modelled, and this is in clear contrast to certain idealized mathematical
entities, such as “point” or “line”, which are independent of the scale of observation.
In certain controlled situations, appropriate scales for analysis may be known a
priori. For example, a desirable property of a good physicist is his intuitive ability
to select appropriate scales to model a given situation. Under other circumstances,
however, it may not be obvious at all to determine in advance what are the proper
scales. One such example is a vision system with the task of analysing unknown scenes.
Besides the inherent multi-scale properties of real world objects (which, in general,
are unknown), such a system has to face the problems that the perspective mapping
gives rise to size variations, that noise is introduced in the image formation process,
and that the available data are two-dimensional data sets reflecting only indirect
properties of a three-dimensional world. To be able to cope with these problems, an
image representation that explicitly incorporates the notion of scale is a crucially
important tool whenever we attempt to interpret sensory data, such as images, by
automatic methods.
In computer vision and image processing, these insights have lead to the con-
struction of multi-scale representations of image data, obtained by embedding any
given signal into a one-parameter family of derived signals (Burt 1981; Crowley
1981; Witkin 1983; Koenderink 1984; Yuille and Poggio 1986; Florack et al. 1992; Lin-
deberg 1994d; Haar Romeny 1994). This family should be parameterized by a scale
parameter and be generated in such a way that fine-scale structures are successively
suppressed when the scale parameter is increased. A main intention behind this con-
struction is to obtain a separation of the image structures in the original image,
such that fine scale image structures only exist at the finest scales in the multi-scale
representation. Thereby, the task of operating on the image data will be simplified,
provided that the operations are performed at sufficiently coarse scales where unnec-
essary and irrelevant fine-scale structures have been suppressed. Empirically, this idea
has proved to be extremely useful, and multi-scale representations such as pyramids,
scale-space representation and non-linear diffusion methods are commonly used as
preprocessing steps to a large number of early visual operations, including feature
detection, stereo matching, optic flow, and the computation of shape cues.
A multi-scale representation by itself, however, contains no explicit information
about what image structures should be regarded as significant or what scales are ap-
propriate for treating those. Hence, unless early judgements can be made about what
image structures should be regarded as important, we obtain a substantial expansion
of the amount of data to be interpreted by later stage processes. In most previous
works, this problem has been handled by formulating algorithms which rely on the
information present in the data at a small set of manually chosen scales (or even a
single scale). Alternatively, coarse-to-fine algorithms have been expressed, which start
at a given coarse scale and propagate down to a given finer scale. Determining such

2 Lindeberg
scales in advance, however, leads to the introduction of free parameters. If one aims at
autonomous algorithms which are to operate in a complex environment without need
for external parameter tuning, we therefore argue that it is essential to complement
traditional multi-scale processing by explicit mechanisms for scale selection. Notably,
image descriptors can be highly unstable if computed at inappropriately chosen scales,
whereas a proper tuning of the scale parameter can improve the quality of an image
descriptor substantially. As will be demonstrated later, local scale information can
also constitute an important to in its own right.
Early work addressing this problem was presented in (Lindeberg 1991, 1993a)
for blob-like image structures. The basic idea was to study the behaviour of image
structures over scales, and to measure the saliency of image structures from the
stability properties and the lifetime of these structures in scale-space. Scale levels
were selected from the scales at which a measure of blob strength assumed local
maxima over scales and significant image structures from the stability of the blob
structures in scale-space. Experimentally, it was shown that this approach could be
used for extracting regions of interest with associated scale levels, which in turn could
serve as to guide various early visual processes.
The subject of this article is to address the problem of automatic scale selection
in a more general setting, for wider classes of image descriptors. We shall be con-
cerned with the problem of extracting image features and computing filter-like image
descriptors, and present a scale selection principle for image descriptors which can be
expressed in terms of Gaussian derivative filters. The general idea for scale selection
that will be proposed is to study the evolution properties over scales of normalized dif-
ferential descriptors. Specifically, it will be suggested that local extrema over scales of
such normalized differential entities, which arise in this way, are likely to correspond
to interesting image structures. By theoretical considerations and experiments it will
be shown that this approach gives rise to intuitively reasonably results in different
situations and that it provides a unified framework for scale selection for detecting
image features such as blobs, corners, edges and ridges.
1.1 Outline of the presentation
The presentation is organized as follows: Section 2 reviews the main concepts from
scale-space theory we build upon. Section 3 introduces the notion of normalized
derivatives and illustrates how maxima over scales of normalized Gaussian deriva-
tives reflect the frequency content in sine wave patterns. This material serves as a
preparation for section 4, which presents the proposed scale selection methodology
and shows how it applies generally to a large class of differential descriptors. Sec-
tion 4 also proposes a general extension of the common idea of defining features as
zero-crossings of spatial differential descriptors. If a scale selection mechanism is inte-
grated into such a feature detector, this corresponds to adding another zero-crossing
requirement over the scale dimension in the differential feature definition.
Then, section 5 and section 6 show in detail how these ideas can be used for for-
mulating blob detectors and corner detectors with automatic scale selection. Section 8
shows an example of how this approach applies to the computation of dense feature
maps. Section 9 describes different ways of interpreting the normalized derivative con-
cept. Finally, section 10 summarizes the main results and ideas of the approach. In a
complementary paper (Lindeberg 1996a) it is developed in detail how this approach
applies to edge detection and ridge detection.
Earlier presentations of different parts of this material have appeared elsewhere
(Lindeberg 1993b, 1994a, 1994d, 1996b) as well as applications of the general ideas to

Feature detection with automatic scale selection 3
various problem domains (Lindeberg and G˚arding 1993, 1997; G˚arding and Lindeberg
1996; Lindeberg and Li 1995, 1997; Bretzner and Lindeberg 1998, 1997; Almansa and
Lindeberg 1996; Wiltschi et al. 1997; Lindeberg 1997). The subject of this paper is to
present a coherent description of the proposed scale selection methodology in journal
form, including the developments and refinements that have been performed since the
earliest presented manuscripts.
2 Scale-space representation: Review
Given any continuous signals f : R
D
R, the (linear) scale-space representation
L: R
D
× R
+
R of f is defined as the solution to the diffusion equation
t
L =
1
2
2
L =
1
2
D
X
i=1
x
i
x
i
L (1)
with initial condition L(·;0)=f(·). Equivalently, this family can be defined by
convolution with Gaussian kernels of various width t
L(·; t)=g(·; t)f(·), (2)
where g : R
D
× R
+
R is given by
g(x; t)=
1
(2πt)
N/2
e
(x
2
1
+...+x
2
D
)/(2t)
, (3)
and x =(x
1
, ..., x
D
)
T
. There are several mathematical results (Koenderink 1984;
Babaud et al. 1986; Yuille and Poggio 1986; Lindeberg 1990, 1994d, 1994b; Koen-
derink and van Doorn 1990, 1992; Florack et al. 1992; Florack 1993; Florack et al.
1994; Pauwels et al. 1995) stating that within the class of linear transformations the
Gaussian kernel is the unique kernel for generating a scale-space. The conditions that
specify the uniqueness are essentially linearity and shift invariance combined with
different ways of formalizing the notion that new structures should not be created in
the transformation from a finer to a coarser scale.
Interestingly, the results from these theoretical considerations are in qualitative
agreement with the results of biological evolution. Neurophysiological studies by
(Young 1985, 1987) have shown that there are receptive elds in the mammalian
retina and visual cortex, whose measured response profiles can be well modelled by
Gaussian derivatives up to order four. In these respects, the scale-space representa-
tion with its associated Gaussian derivative operators (where α denotes the order of
differentiation)
L
x
α
(·; t)=(
x
α
L)(·, ·; t)=
x
α
(gf)=(
x
α
g)f=g(
x
α
f), (4)
can be seen as a canonical idealized model of a visual front-end. It is for this multi-
scale representation concept we will develop the scale selection methodology.
3 Normalized derivatives and intuitive idea for scale selection
A well-known property of the scale-space representation is that the amplitude of
spatial derivatives
L
x
α
(·; t)=
x
α
L(·; t)=
x
α
1
1
...
x
α
D
D
L(·; t)(5)

Citations
More filters
Book ChapterDOI
07 May 2006
TL;DR: A novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
Abstract: In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF's strong performance.

13,011 citations


Cites background or methods from "Feature Detection with Automatic Sc..."

  • ...The detector is based on the Hessian matrix [11, 1], but uses a very basic approximation, just as DoG [2] is a very basic Laplacian-based detector....

    [...]

  • ...Lindeberg introduced the concept of automatic scale selection [1]....

    [...]

Journal ArticleDOI
TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.

12,449 citations


Cites background or methods from "Feature Detection with Automatic Sc..."

  • ...In contrast to the Hessian-Laplace detector by Mikolajczyk and Schmid [26], we rely on the determinant of the Hessian also for the scale selection, as done by Lindeberg [21]....

    [...]

  • ...of the Hessian also for the scale selection, as done by Lindeberg [21]....

    [...]

  • ...Lindeberg [21] introduced the concept of automatic scale selection....

    [...]

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations


Cites background from "Feature Detection with Automatic Sc..."

  • ...Lindeberg [ 23 ] has developed a scale-invariant iblobi detector, where a iblobi is dened by...

    [...]

Journal ArticleDOI
TL;DR: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation and presents state-of-the-art algorithms for both of these tasks.
Abstract: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation. We present state-of-the-art algorithms for both of these tasks. Our contour detector combines multiple local cues into a globalization framework based on spectral clustering. Our segmentation algorithm consists of generic machinery for transforming the output of any contour detector into a hierarchical region tree. In this manner, we reduce the problem of image segmentation to that of contour detection. Extensive experimental evaluation demonstrates that both our contour detection and segmentation methods significantly outperform competing algorithms. The automatically generated hierarchical segmentations can be interactively refined by user-specified annotations. Computation at multiple image resolutions provides a means of coupling our system to recognition applications.

5,068 citations

Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations


Cites background from "Feature Detection with Automatic Sc..."

  • ...They compare a number of feature detectors (Harris-Laplace (Mikolajczyk and Schmid 2004) and Laplacian (Lindeberg 1998b)), descriptors (SIFT, RIFT, and SPIN (Lazebnik et al....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: It is proven that the local maxima of the wavelet transform modulus detect the locations of irregular structures and provide numerical procedures to compute their Lipschitz exponents.
Abstract: The mathematical characterization of singularities with Lipschitz exponents is reviewed. Theorems that estimate local Lipschitz exponents of functions from the evolution across scales of their wavelet transform are reviewed. It is then proven that the local maxima of the wavelet transform modulus detect the locations of irregular structures and provide numerical procedures to compute their Lipschitz exponents. The wavelet transform of singularities with fast oscillations has a particular behavior that is studied separately. The local frequency of such oscillations is measured from the wavelet transform modulus maxima. It has been shown numerically that one- and two-dimensional signals can be reconstructed, with a good approximation, from the local maxima of their wavelet transform modulus. As an application, an algorithm is developed that removes white noises from signals by analyzing the evolution of the wavelet transform maxima across scales. In two dimensions, the wavelet transform maxima indicate the location of edges in images. >

4,064 citations


"Feature Detection with Automatic Sc..." refers background or methods in this paper

  • ...A main difference between the scale selection mechanism suggested here and the work in (Lindeberg, 1991; Mallat and Hwang, 1992), however, here it is shown that how these notions can be applied to large classes of non-linear differential invariants computed in a scale-space representation....

    [...]

  • ...An analysis of scale-space like responses to sine waves corresponding to the case when γ = 1 in this section has also been performed in wavelet analysis by (Mallat and Hwang, 1992); see Section 9....

    [...]

Book
11 Aug 2011
TL;DR: The authors describe an algorithm that reconstructs a close approximation of 1-D and 2-D signals from their multiscale edges and shows that the evolution of wavelet local maxima across scales characterize the local shape of irregular structures.
Abstract: A multiscale Canny edge detection is equivalent to finding the local maxima of a wavelet transform. The authors study the properties of multiscale edges through the wavelet theory. For pattern recognition, one often needs to discriminate different types of edges. They show that the evolution of wavelet local maxima across scales characterize the local shape of irregular structures. Numerical descriptors of edge types are derived. The completeness of a multiscale edge representation is also studied. The authors describe an algorithm that reconstructs a close approximation of 1-D and 2-D signals from their multiscale edges. For images, the reconstruction errors are below visual sensitivity. As an application, a compact image coding algorithm that selects important edges and compresses the image data by factors over 30 has been implemented. >

3,187 citations

Journal ArticleDOI
TL;DR: The results obtained with six natural images suggest that the orientation and the spatial-frequency tuning of mammalian simple cells are well suited for coding the information in such images if the goal of the code is to convert higher-order redundancy into first- order redundancy.
Abstract: The relative efficiency of any particular image-coding scheme should be defined only in relation to the class of images that the code is likely to encounter. To understand the representation of images by the mammalian visual system, it might therefore be useful to consider the statistics of images from the natural environment (i.e., images with trees, rocks, bushes, etc). In this study, various coding schemes are compared in relation to how they represent the information in such natural images. The coefficients of such codes are represented by arrays of mechanisms that respond to local regions of space, spatial frequency, and orientation (Gabor-like transforms). For many classes of image, such codes will not be an efficient means of representing information. However, the results obtained with six natural images suggest that the orientation and the spatial-frequency tuning of mammalian simple cells are well suited for coding the information in such images if the goal of the code is to convert higher-order redundancy (e.g., correlation between the intensities of neighboring pixels) into first-order redundancy (i.e., the response distribution of the coefficients). Such coding produces a relatively high signal-to-noise ratio and permits information to be transmitted with only a subset of the total number of cells. These results support Barlow's theory that the goal of natural vision is to represent the information in the natural environment with minimal redundancy.

3,077 citations


"Feature Detection with Automatic Sc..." refers background in this paper

  • ...It is well-known that natural images often show a qualitative behaviour similar to this ( Field, 1987 )....

    [...]

Book ChapterDOI
01 Jan 1987
TL;DR: Scale-space filtering is a method that describes signals qualitatively, managing the ambiguity of scale in an organized and natural way.
Abstract: The extrema in a signal and its first few derivatives provide a useful general-purpose qualitative description for many kinds of signals. A fundamental problem in computing such descriptions is scale: a derivative must be taken over some neighborhood, but there is seldom a principled basis for choosing its size. Scale-space filtering is a method that describes signals qualitatively, managing the ambiguity of scale in an organized and natural way. The signal is first expanded by convolution with gaussian masks over a continuum of sizes. This "scale-space" image is then collapsed, using its qualitative structure, into a tree providing a concise but complete qualitative description covering all scales of observation. The description is further refined by applying a stability criterion, to identify events that persist of large changes in scale.

3,008 citations