Feature Detection with Automatic Scale Selection

doi:10.1023/A:1008045108935

Journal Article•DOI•

Feature Detection with Automatic Scale Selection

Tony Lindeberg¹•Institutions (1)

01 Nov 1998-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 30, Iss: 2, pp 79-116

TL;DR: It is shown how the proposed methodology applies to the problems of blob detection, junction detection, edge detection, ridge detection and local frequency estimation and how it can be used as a major mechanism in algorithms for automatic scale selection, which adapt the local scales of processing to the local image structure.

read less

Abstract: The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works, Witkin (1983) and Koenderink (1984) proposed to approach this problem by representing image structures at different scales in a so-called scale-space representation. Traditional scale-space theory building on this work, however, does not address the problem of how to select local appropriate scales for further analysis. This article proposes a systematic methodology for dealing with this problem. A framework is presented for generating hypotheses about interesting scale levels in image data, based on a general principle stating that local extrema over scales of different combinations of γ-normalized derivatives are likely candidates to correspond to interesting structures. Specifically, it is shown how this idea can be used as a major mechanism in algorithms for automatic scale selection, which adapt the local scales of processing to the local image structure. Support for the proposed approach is given in terms of a general theoretical investigation of the behaviour of the scale selection method under rescalings of the input pattern and by integration with different types of early visual modules, including experiments on real-world and synthetic data. Support is also given by a detailed analysis of how different types of feature detectors perform when integrated with a scale selection mechanism and then applied to characteristic model patterns. Specifically, it is described in detail how the proposed methodology applies to the problems of blob detection, junction detection, edge detection, ridge detection and local frequency estimation. In many computer vision applications, the poor performance of the low-level vision modules constitutes a major bottleneck. It is argued that the inclusion of mechanisms for automatic scale selection is essential if we are to construct vision systems to automatically analyse complex unknown environments.

...read moreread less

Summary (9 min read)

Jump to: [1 Introduction] – [1.1 Outline of the presentation] – [2 Scale-space representation: Review] – [3 Normalized derivatives and intuitive idea for scale selection] – [4 Proposed methodology for scale selection] – [4.1 General scaling property of local maxima over scales] – [4.2 The scale selection mechanism in practice] – [4.3 Experiments: Scale-space signatures from real data] – [4.4 Simultaneous detection of interesting points and scales] – [5 Blob detection with automatic scale selection] – [5.1 Analysis of scale-space maxima for idealized model patterns] – [5.2 Comparisons with fixed-scale blob detection] – [5.3 Applications of blob detection with automatic scale selection] – [6.1 Selection of detection scales from normalized scale-space maxima] – [6.2 Analysis of scale-space maxima for diffuse junction models] – [6.3 Experiments: Scale-space signatures in junction detection] – [7 Feature localization with automatic scale selection] – [7.1 Corner localization by local consistency] – [7.2 Automatic selection of localization scales] – [7.3 Experiments: Choice of localization scale] – [7.4 Composed scheme for junction detection and localization] – [7.5 Further experiments] – [200 strongest junctions 200 strongest junctions] – [7.6 Applications of corner detection with automatic scale selection] – [7.7 Extensions of the junction detection method] – [7.8 Extensions to edge detection] – [8 Dense frequency estimation] – [9.3 Relations to previous work] – [10 Summary and discussion] and [10.1 Technical contributions]

1 Introduction

One of the very fundamental problems that arises when analysing real-world measurement data originates from the fact that objects in the world may appear in different ways depending upon the scale of observation.
Notably, the type of physical description that is obtained may be strongly dependent on the scale at which the world is modelled, and this is in clear contrast to certain idealized mathematical entities, such as “point” or “line”, which are independent of the scale of observation.
Under other circumstances, however, it may not be obvious at all to determine in advance what are the proper scales.
A main intention behind this construction is to obtain a separation of the image structures in the original image, such that fine scale image structures only exist at the finest scales in the multi-scale representation.
The subject of this article is to address the problem of automatic scale selection in a more general setting, for wider classes of image descriptors.

1.1 Outline of the presentation

The presentation is organized as follows: Section 2 reviews the main concepts from scale-space theory the authors build upon.
Section 3 introduces the notion of normalized derivatives and illustrates how maxima over scales of normalized Gaussian derivatives reflect the frequency content in sine wave patterns.
This material serves as a preparation for section 4, which presents the proposed scale selection methodology and shows how it applies generally to a large class of differential descriptors.
Section 8 shows an example of how this approach applies to the computation of dense feature maps.
In a complementary paper (Lindeberg 1996a) it is developed in detail how this approach applies to edge detection and ridge detection.

2 Scale-space representation: Review

There are several mathematical results (Koenderink 1984; Babaud et al.
Interestingly, the results from these theoretical considerations are in qualitative agreement with the results of biological evolution.
Neurophysiological studies by (Young 1985, 1987) have shown that there are receptive fields in the mammalian retina and visual cortex, whose measured response profiles can be well modelled by Gaussian derivatives up to order four.

3 Normalized derivatives and intuitive idea for scale selection

This is a direct consequence of the non-enhancement property of local extrema, which states that the value at a local maximum cannot increase, and the value at a local minimum cannot decrease.
In practice, it means that the amplitude of the variations in a signal will always decrease with scale.
In the special case when γ = 1, these ξ-coordinates and their associated normalized derivative operator are dimensionless.
As the authors shall see later, however, values of γ < 1 will be highly useful when formulating scale selection mechanisms for edge detection and ridge detection.
For the sinusoidal signal, the amplitude of an mth order normalized derivative as function of scale is then given by Lξm,max(t) = tmγ/2 ωm0 e −ω20t/2, (11) i.e., it first increases and then decreases.

4 Proposed methodology for scale selection

The example above shows that the scale at which a normalized derivative assumes its maximum over scales is for a sinusoidal signal proportional to the wavelength of the signal.
Maxima over scales of normalized derivatives reflect the scales over which spatial variations take place in the signal.
Yhis operation corresponds to an interesting computational structure, since it constitutes a way of estimating length based on local measurements performed at only a single spatial point in the scale-space representation, and without explicitly laying out a ruler.
Moreover, compared to a local windowed Fourier transform there is no need for making any explicit settings of window size for computing the Fourier transform.
This principle is closely related to although not equivalent to the method for scale selection in previously proposed in (Lindeberg 1991, 1993a), where interesting scale levels were determined from maxima over scales of a normalized blob measure.

4.1 General scaling property of local maxima over scales

A basic justification for the abovementioned arguments can be obtained from the fact that for a large class of (possibly non-linear) combinations of normalized derivatives it holds that maxima over scales have a nice behaviour under rescalings of the intensity pattern.
Let us hence restrict the analysis to polynomial differential invariants which are homogeneous in the sense that the sum of the orders of differentiation is the same for each term in the polynomial.
For a differential expression of this form, the corresponding normalized differential expression in each domain is given by Dγ−normL = tMγ/2DL, (26) D′γ−normL′ = t′Mγ/2D′L′. (27) From (23) it follows that these normalized differential expressions are related by Dγ−normL = sM(1−γ)D′γ−normL′. (28) Clearly, by γ-normalization with γ = 1, the magnitude of the derivative is not scale invariant.
Hence, even when γ 6= 1, the authors can achieve sufficient scale invariance to support the proposed scale selection methodology.
From the the transformation property (23), it is, however, apparent that this magnitude measure will be strongly dependent on the scale at which the maximum over scales is assumed.

4.2 The scale selection mechanism in practice

So far the authors have proposed a general methodology for scale selection by detecting local maxima in feature responses over scales.
Here, the authors shall not attempt to answer this question.
Let us instead contend that the differential expression should at least be determined so as to capture the types of image structures under consideration.
The general approach to scale selection that will be proposed is to use these maximal responses over scales in the stage of detecting image features, i.e., when establishing the existence of different types of image structures.
The suggested framework naturally gives rise to two-stage algorithms, with feature detection at coarse scales followed by feature localization at finer scales.

4.3 Experiments: Scale-space signatures from real data

(To avoid the sensitivity to sign of these entities, and hence the polarity of the signal, traceHnormL and detHnormL have been squared before presentation.).
These graphs are called the scale-space signatures of 2 and 2, respectively.
This example illustrates that results in agreement with the proposed scale selection principle can be obtained also for real-world data (and for signals having a much richer frequency content than a single sine wave).
The reason why these particular differential expressions have been selected here is because they constitute differential entities useful for blob detection; see e.g.

4.4 Simultaneous detection of interesting points and scales

In figure 2, the signatures of the normalized differential entities were computed at the central point in each image.
These points were deliberately chosen to coincide 2 with the centers of the sunflowers, where the blob response can be expected to be maximal under spatial perturbations.
Specific examples of this idea will be worked out in more detail in the following sections.
Referring to the invariance properties of local maxima over scales under rescalings of the input signal, the authors can observe that they transfer trivially to scale-space maxima.

5 Blob detection with automatic scale selection

Every scale-space maximum has been graphically illustrated by a circle centered at the point at which the spatial maximum is assumed, and with the size determined such that the radius (measured in pixel units) is proportional to the scale at which the maximum is assumed (measured in dimension length).
To reduce the number of blobs, a threshold on the maximum normalized response has been selected such that the 250 blobs having the maximum normalized responses according to (30) remain.
The bottom row shows the result of superimposing these circles onto a bright copy of the original image, as well as corresponding results for the normalized scalespace extrema of the square of the determinant of the Hessian matrix.
Corresponding experiments for a synthetic pattern (analysed in section 5.1) are given in figure 4.
Observe how these conceptually very simple differential geometric descriptors give a very reasonable description of the blob-like structures in the image (in particular concerning the blob size) considering how little information is used in the processing.

5.1 Analysis of scale-space maxima for idealized model patterns

Whereas the theoretical analysis in section 4.1 applies generally to large classes of differential invariants and input signals, one may ask how the scale selection method for blob detection performs in specific situations.
The authors shall study two 3When detecting scale-space maxima in practice, there is, of course, no need to explicitly track the extrema along the extremum path in scale-space.
Model patterns for which a closed-form solution of diffusion equation can be calculated and a complete analytical study hence is feasible.
There is a unique solution when the ratio ω2/ω1 is close to one, and three solutions when the ratio is sufficiently large.

5.2 Comparisons with fixed-scale blob detection

Figure 6 shows the result of computing spatial maxima at different scales in the response of the Laplacian operator from the sine wave pattern in figure 4.
At each scale, the 50 strongest responses have been extracted.
As can be seen, small blobs are given the highest relative ranking at fine scales, whereas large blobs are given the highest relative ranking at coarse scales.
Hence, a blob detector of this type (operating at a single predetermined scale) induces a bias towards image structures of a certain size.
(As was shown above, the associated measure of blob strength is strictly scale invariant.).

5.3 Applications of blob detection with automatic scale selection

Following the previously presented arguments, the authors argue that a scale selection mechanism is an essential complement to any blob detector aimed at handling large size variations in the image structures.
In addition, scale information associated with such adaptively computed image descriptors may serve as an important cue in its own right.
In (Bretzner and Lindeberg 1996, 1998) an application to feature tracking is presented, where (i) the scale information constitutes a key component in the criterion for matching image features over time, and (ii) the scale selection mechanism is essential for the vision system to be able to capture objects under large size variations over time.

6.1 Selection of detection scales from normalized scale-space maxima

A commonly used entity for junction detection is the curvature of level curves in intensity data multiplied by the gradient magnitude (Kitchen and Rosenfeld 1982; Dreschler and Nagel 1982; Koenderink and Richards 1988; Noble 1988; Deriche and Giraudon 1990; Blom 1992; Florack et al. 1992; Lindeberg 1994d).
To reduce the number of junction candidates, the scale-space maxima have been sorted with respect to a saliency measure.
Finally, the 50 most significant blobs according to this ranking have been displayed.
Of course, thresholding on the magnitude of the operator response constitutes a coarse selective mechanism for feature detection.
Nevertheless, note that this operation gives rise to a set of junction candidates with reasonable interpretations in the scene.

6.2 Analysis of scale-space maxima for diffuse junction models

To obtain an intuitive understanding of the qualitative behaviour of the scale selection method in this case, let us analyse a simple junction model for which a closed-form analysis can be carried out without too much effort.
Unfortunately, the equation that determines the position of the spatial maximum in κ̃2 over scales is non-trivial to handle (it contains a non-linear combination of the Gaussian function, the primitive function of the Gaussian, and polynomials).
This function can be regarded as a coarse model of the behaviour at so coarse scales in scale-space that the shape distortions are substantial and the overall shape of a finite-size object is severely affected.
Hence, selecting scale levels (and spatial points) where κ̃2norm assumes maxima over scales can be expected to give rise to scale levels in the intermediate scale range (where a finite extent junction model constitutes a reasonable approximation).

6.3 Experiments: Scale-space signatures in junction detection

Figure 9 illustrates these effects for synthetic L-junctions with varying degrees of diffuseness.
In other words, the scale at which the maximum over scales is assumed indicates the spatial extent (the size) of the region for which a junction model is consistent with the grey-level data (in agreement with the suggested scale selection principle).
It shows scale-space maxima of κ̃2norm computed from a synthetic image containing corner structures at different scales.
The original greylevel image is shown in the ground plane, and each scale-space maximum has been graphically visualized by a sphere centered at the position (x0; t0) in scale-space at which the scale-space maximum was assumed.
More results on corner detection, including a complementary mechanism for accurate corner localization, are presented in section 7.

7 Feature localization with automatic scale selection

The scale selection methodology presented so far applies to the detection of image features, and the role of the scale selection mechanism is to estimate the approximate size of the image structures the feature detector responds to.
Whereas this approach provides a conceptually simple way to express various feature detectors, such as a junction detector, which automatically adapts its scale levels to the local image structure, it is not guaranteed that the spatial positions of the scale-space maxima constitute accurate estimates of the corner locations.
The local maxima over scales may be assumed at rather coarse scales, where the drift due to scale-space smoothing is substantial and adjacent features may interfere with each other.
For this reason, it is natural to complement the initial feature detection step by an explicit feature localization stage.
The subject of this section is show how mechanism for automatic scale selection can be formulated in this context, by minimizing normalized measures of inconsistency over scales.

7.1 Corner localization by local consistency

Minimizing this expression corresponds to finding the point x that minimizes the weighted integral of the squares of the distances from x to all lx′ in the neighbourhood, see figure 12.
(Dx′(x) is distance from x to lx′ multiplied by the gradient magnitude, and the window function implies that stronger weights are given to points in a neighbourhood of x0.).
The overall intention of this formulation is that for an image pattern containing a junction, the point x that minimizes (57) should constitute a better estimate of the projection of the physical junction than x0.
Explicit solution in terms of local image statistics.

7.2 Automatic selection of localization scales

The formulation in previous section however, leaves two major problems open: Moreover, let the scale value of this window function be proportional to the detection scale t0 at which the maximum over scales in κ̃2norm was assumed.
Specifically, scale selection according by minimizing the normalized residual r̃ (65) over scales, corresponds to selecting the scale that minimizes the estimated inaccuracy in the localization estimate.
Thus, when smoothing is necessary, the residual will decrease.
This can be easily understood by observing that for an ideal polygon-type junction (consisting of regions of uniform grey-level delimited by straight edges), all edge tangents meet at the junction point, which means that the residual d̃min is exactly zero.

7.3 Experiments: Choice of localization scale

Figure 13 and figure 14 show the result of applying this scale selection mechanism to a sharp and a diffuse corner with different amounts of added white Gaussian noise.
As can be seen, the results agree with a the qualitative discussion above.
For each noise level, this table gives the scale at which the normalized residual assumes its minimum over scales, as well as the scale at which the estimate with the minimum absolute error over scales is obtained.
The results show that the normalized residual serves as an estimate of the inaccuracy in the corner localization estimate, and specifically that the scale at which the minimum over scales in d̃min is assumed is a reasonable estimate of the scale at which the authors have the localization estimate with the minimum absolute error.
Figure 15 shows the result of applying the composed junction localization stage to the junction candidates in figure 8.

7.4 Composed scheme for junction detection and localization

To summarize, the composed two-stage scheme for junction detection and junction localization consists of the following processing steps:5 1. Detection.
Detect scale-space maxima in the square of the normalized rescaled level curve curvature κ̃norm = t2γ κ̃ = t2γ (L2x2Lx1x1 − 2Lx1Lx2Lx1x2 + L2x1Lx2x2) (or some other suitable normalized differential entity).
This generates a set of junction candidates.
5Besides the general descriptions given in previous sections.

7.5 Further experiments

Concerning the number of junction candidates to be processed and passed on to later processing stages, the authors have not made any attempts in this work to decide automatically how many of the extracted junction candidates correspond to physical junctions in the world.
The authors argue that such decisions require integration with higherlevel reasoning and verification processes, and may be extremely hard to make at the earliest processing stages unless additional information is available about the external conditions.
For this reason, this module only aims at computing an early ranking of image features in order of significance, which can be used by a vision system for processing features in decreasing order of significance.
6An integrated vision system for analysing junctions by actively zooming in to interesting structures is presented in (Brunnström et al. 1992; Lindeberg 1993a).

200 strongest junctions 200 strongest junctions

In line with this idea, the results are shown in terms of the N strongest junction candidates for different (manually chosen) values of N .
In figure 17, which shows corresponding examples for more cluttered scenes, the number of junctions displayed has been increased to 100 and 200.
Notably, this number of junction candidates constitutes the only essential tuning parameter of the composed algorithm.
Here, the 10 most significant junctions have been processed.
The table in figure 19 shows numerical values exemplifying how large the localization errors can be in the different processing stages.

7.6 Applications of corner detection with automatic scale selection

In (Lindeberg and Li 1995, 1997) it is shown how the support region associated with each junction allows for conceptually simple matching between junctions and edges based on spatial overlap only and without any need for providing externally determined thresholds on e.g. distance.
Then, the matching relations between edges and junction cues that arise in this way are used in a pre-processing stage for classifying edges into straight and curved.
In (Bretzner and Lindeberg 1996) it is demonstrated how these support regions can be used for simplifying matching of junctions over time in tracking algorithms.
Specifically, it is shown that the scale selection mechanism in the junction detector is essential to capture junctions that undergo large size changes.
In (Lindeberg 1995a, 1996d) a scale selection principle for stereo matching and flow estimation is presented, which also involves the extension of a fixed scale least squares estimation problem to optimization over multiple scales.

7.7 Extensions of the junction detection method

The main purpose of the presentation in this section has been to make explicit how a scale selection mechanism can be incorporated into a junction detector.
When building a stand-alone junction detector, there are a few additional mechanisms which are natural to include if the aim is to construct a stand-alone junction detector.
Concerning the ranking on significance, the authors can conceive linking the maxima of the junction responses across scales in a similar way as done in the scale-space primal sketch (Lindeberg 1993a), register scale-space events such as bifurcations, and include the scale-space lifetime of each junction response into the significance measure.
Concerning the region of interest associated with each junction candidate, the authors have throughout this work represented the support region of a scale-space maximum by a circle with area reflecting the detection scale.
A possible limitation of this approach is that nearby junctions may lead to interference effects in operations such as the localization stage.

7.8 Extensions to edge detection

Concerning more general applications of the proposed methodology, it should be noted that the scale selection method for junction localization applies to edge detection as well.
The columns show from left to right; (i) the local grey-level pattern, (ii) the signature of d̃min computed at the central point, and (iii) edges detected at the scale td = argmin d̃min at which the minimum over scales in d̃min was assumed.
In the first row, the authors can see that when performing edge detection at argmin d̃min they obtain coherent edge descriptor corresponding to the dominant edge structure in this region.
In the second row, a large amount of white Gaussian noise has been added to the grey-level image, and the minimum over scales is assumed at a much coarser scale.
Concerning these experiments it should be pointed out that they are mainly intended to demonstrate the potential in applying the proposed method for selecting localization scales to the problem of edge detection, and that further processing steps are needed to give a complete algorithm.

8 Dense frequency estimation

So far, the authors have seen how the scale selection methodology can be applied to the detection of sparse feature points.
An obvious problem that arises if the authors would base a scale selection mechanism for computing dense image descriptors on a partial derivative of the intensity function, such as the Laplacian operator is that there would be large spatial variations in the operator response.
A common methodology in signal processing for reducing this so-called phase dependency is by using quadrature filter pairs defined (from a Hilbert transform) in such a way that the Euclidean sum of the filter responses will be constant for any sine wave.
(As will be shown below, this scale value is of the same order of magnitude as the scales that maximize QL over scales; compare also with section 3.).
In the abovementioned sources, these specific entities and normalization parameters are shown to be useful for edge detection and ridge detection with automatic scale selection.

9.3 Relations to previous work

Such L1-normalized kernels of first order have been used, for example, in edge detection and edge classification by (Korn 1988), (Mallat and Zhong 1992), and (Zhang and Bergholm 1993), and in pyramids by (Crowley and Parker 1984).
More generally, evolution properties across scales of wavelet transforms have been used by (Mallat and Hwang 1992) for characterizing local Lipshitz exponents of singularities.
There is also a connection to the “top point” representation proposed by (Johansen et al. 1986) in the sense that the points in the scale-space at which bifurcations occur serve as to delimit extremum paths with different topology.
A main difference between the scale selection mechanism suggested here and the work in (Lindeberg 1991) and (Mallat and Hwang 1992), however, is that here it is shown how these notions can be applied to large classes of non-linear differential invariants computed in a scale-space representation.
Moreover, feature detection algorithms have been formulated with integrated scale selection mechanisms and it has been shown how different derivative normalization approaches lead to different classes of differential expressions for which the scale selection mechanism commutes with rescalings of the input pattern.

10 Summary and discussion

The authors have argued that the subject of scale selection is essential to many problems in computer vision and automated image analysis.
A general scale selection principle has been presented stating that in the absence of other evidence, coarse estimates of the size of image structures can be computed from the scales at which normalized differential geometric descriptors assume maxima over scales.
Adapted coarse scales, and then localized to finer scales in a second stage processing stage.
Whereas the general advantages of such a two-stage approach to feature detection are well-known in the literature, a major contribution here is that explicit mechanisms are provided for automatic selection of the detection scales as well as the localization scales.
Moreover, these processing stages are integrated into algorithms which are essentially free from other tuning parameters that the number of features of interest.

10.1 Technical contributions

At a technically more detailed level some of the main contributions are that: .
It is emphasized how the evolution properties over scales of normalized scalespace derivatives differ from those of traditional spatial derivatives.
A general scale selection principle for scale selection has been proposed stating that extrema over scales in the signature of normalized differential entities are useful in the stage of detecting image features.
The problem of junction detection is treated more extensively, and the resulting method is the first junction detection algorithm with automatic scale selection.
Specifically, it is shown how localization scales can be selected automatically by minimizing a certain normalized residual across scales.

Did you find this useful? Give us your feedback

Figures (26)

Figure 2 shows the variations over scales of two simple differential expressions formulated in terms of normalized derivatives. It shows the result of computing the trace and the determinant of the normalized Hessian matrix by (with γ = 1)

Table 2: Measures of feature strength and normalization parameters used for different types of feature detectors with automatic scale selection (including results from a companion paper (Lindeberg 1996c, 1996b)). For each feature detector, a preferred γ-value is specified as well as the p-value for which the Lp-norm of the Gaussian derivatives is constant over scales (76) and the β-value for which the energy of a self-similar Fourier spectrum is constant over scales (89).

Figure 12: Minimizing (57) basically corresponds to finding the point x that minimizes the distance to all edge tangents in a neighbourhood of the given candidate junction point x0.

Figure 21: Spatial variation of the selected scale levels when maximizing the quasi quadrature entity (66) over scales for different values of the free parameter C using a one-dimensional sine wave of unit input frequency as input pattern. Observe that C = 2/3 gives rise to the most symmetric variations in the selected scale values.

Figure 22: Spatial variation of the maximum value over scales of the quasi quadrature entity (66) computed for different values of the free parameter C using a one-dimensional sine wave of unit input frequency as input pattern. As can be seen, the smallest spatial variations in the amplitude of the maximum response are obtained for C = e/4.

Figure 5: Three-dimensional view of the 150 strongest scale-space maxima of the square of the normalized Laplacian of the Gaussian computed from the sunflower image. (A dark copy of the original grey-level image is shown in the ground plane, and the vertical dimension represents scale.)

Figure 15: Improved localization estimates for the junction candidates in figure 8. Each junction has been graphically illustrated by a circle centered at the new location estimate. In the left image, the size reflects the detection scale, whereas in the right image, the size reflects the localization scale.

Figure 1: The amplitude of first order normalized derivatives as function of scale for sinusoidal input signals of different frequencies (ω1 = 0.5, ω2 = 1.0 and ω3 = 2.0).

Figure 23: Dense scale selection by maximizing the quasi quadrature measure (68) over scales: (left) Original grey-level image. (right) The variations over scales of the quasi quadrature measureQL computed along a vertical cross-section through the center of the image. The result is visualized as a surface plot showing the variations over scale of the quasi quadrature measure as well as the position of the first local maximum over scales.

Figure 10: Scale-space signatures of κ̃norm for diffuse L-junctions (t0 = 1.0) of different spatial extent (1/4 and 1/16 of the image size). (left) original grey-level image, (middle) path signature of κ̃norm accumulated by tracking a spatial maximum in κ̃norm across scales, (right) vertical signature of κ̃norm accumulated at the central point.

Figure 9: Scale-space signatures of κ̃norm for synthetic L-junctions with different degrees of diffuseness (top t = 4.0, bottom t = 64.0). (left) original grey-level image, (middle) path signature of κ̃norm accumulated by tracking a spatial maximum in κ̃norm across scales, (right) vertical signature of κ̃norm accumulated at the central point.

Figure 4: The 250 most significant normalized scale-space extrema detected from the perspective projection of a sine wave of the form (with 10% added Gaussian noise).

Figure 3: Normalized scale-space maxima computed from an image of a sunflower field: (top left): Original image. (top right): Circles representing the 250 normalized scale-space maxima of (traceHnormL)2 having the strongest normalized response. (bottom left): Circles representing scale-space maxima of (traceHnormL)2 superimposed onto a bright copy of the original image. (bottom right): Corresponding results for scale-space maxima of (detHnormL)2.

Figure 19: Localization errors in the different processing stages of the composed junction detection scheme applied to the synthetic junction models in figure 18: (i) in the detection stage, (ii) after one localization step, (iii) after convergence of the iterative procedure.

Figure 18: The result of applying the composed junction detection scheme to synthetic junction models with added Gaussian noise (ν = 10.0).

Figure 20: Illustration of the result of applying minimization over scales of the normalized residual d̃min to different types of edge structures: (left) grey-level image, (middle) scale-space signature of d̃min accumulated at the central point, (right) (unthresholded) edges computed at the scale where d̃min assumes its minimum over scales.

Figure 7 shows the results of applying this operator to a grey-level image at a number of different scales. The results are displayed in two ways; (i) in terms of grey-level images showing the scale-space representation L as well as the junction response κ̃2 computed at each scale, and (ii) in terms of the 50 strongest spatial maxima of κ̃2, respectively, extracted at the same scale levels. As can be seen, qualitatively different responses are obtained at different scales. At fine scales, the strongest responses are obtained for the sharp corners and for a number of spurious fine-scale perturbations along edges. Then, with increasing values of the scale parameter, the selectivity to junction-like structures increases. In particular, the diffuse (non-sharp) and strongly rounded corners only give rise to strong responses at coarse scales. In summary, this example illustrates the following fundamental aspects of multi-scale corner detection:

Figure 14: Corresponding results for a synthetic diffuse T -junction (t0 = 64.0).

Figure 13: Scale-space signatures of the normalized residual at a synthetic sharp T -junction (t0 = 0.0) for different amounts of added white Gaussian noise (ν = 1.0 and 30.0): (left) greylevel image, (middle) signature of d̃min accumulated at the central point, (right) localization estimate computed at the scale at which d̃min assumes its minimum over scales (illustrated by a circle overlayed onto a bright copy of the image smoothed to that scale).

Figure 6: The 50 strongest spatial responses to the Laplacian operator computed at the scale levels: (a) t = 4.0, (b) t = 16.0, and (c) t = 64.0. Observe how this blob detector leads to a bias towards image structures of a certain size.

Content maybe subject to copyright Report

Feature Detection with Automatic Scale Selection

Tony Lindeberg

∗

Computational Vision and Active Perception Laboratory (CVAP)

Department of Numerical Analysis and Computing Science

KTH (Royal Institute of Technology)

S-100 44 Stockholm, Sweden.

http://www.nada.kth.se/˜tony

Email: tony@nada.kth.se

Technical report ISRN KTH/NA/P–96/18–SE, May 1996, Revised August 1998.

Int. J. of Computer Vision, vol 30, number 2, 1998. (In press).

Abstract

The fact that objects in the world appear in diﬀerent ways depending on the

scale of observation has important implications if one aims at describing them. It

shows that the notion of scale is of utmost importance when processing unknown

measurement data by automatic methods. In their seminal works, Witkin (1983)

and Koenderink (1984) proposed to approach this problem by representing image

structures at diﬀerent scales in a so-called scale-space representation. Traditional

scale-space theory building on this work, however, does not address the problem

of how to select local appropriate scales for further analysis.

This article proposes a systematic methodology for dealing with this problem.

A framework is proposed for generating hypotheses about interesting scale levels

in image data, based on a general principle stating that local extrema over scales

of diﬀerent combinations of γ-normalized derivatives are likely candidates to

correspond to interesting structures. Speciﬁcally, it is shown how this idea can

be used as a major mechanism in algorithms for automatic scale selection, which

adapt the local scales of processing to the local image structure.

Support for the proposed approach is given in terms of a general theoretical

investigation of the behaviour of the scale selection method under rescalings of

the input pattern and by experiments on real-world and synthetic data. Support

is also given by a detailed analysis of how diﬀerent types of feature detectors

perform when integrated with a scale selection mechanism and then applied to

characteristic model patterns. Speciﬁcally, it is described in detail how the pro-

posed methodology applies to the problems of blob detection, junction detection,

edge detection, ridge detection and local frequency estimation.

In many computer vision applications, the poor performance of the low-level

vision modules constitutes a major bottle-neck. It will be argued that the inclu-

sion of mechanisms for automatic scale selection is essential if we are to construct

vision systems to analyse complex unknown environments.

Keywords: scale, scale-space, scale selection, normalized derivative, feature detec-

tion, blob detection, corner detection, frequency estimation, Gaussian derivative,

scale-space, multi-scale representation, computer vision

∗

This work was partially performed under the ESPRIT-BRA project INSIGHT and the ESPRIT-

NSF collaboration DIFFUSION. The support from the Swedish Research Council for Engineering

Sciences, TFR, is gratefully acknowledged. The three-dimensional illustrations in ﬁgure 5 and ﬁg-

ure 11 have been generated with the kind assistance of Pascal Grostabussiat.

ii Lindeberg

Contents

1 Introduction 1

1.1 Outlineofthepresentation ........................ 2

2 Scale-space representation: Review 3

3 Normalized derivatives and intuitive idea for scale selection 3

4 Proposed methodology for scale selection 5

4.1 Generalscalingpropertyoflocalmaximaoverscales .......... 6

4.2 Thescaleselectionmechanisminpractice ................ 8

4.3 Experiments:Scale-spacesignaturesfromrealdata........... 9

4.4 Simultaneousdetectionofinterestingpointsandscales......... 9

5 Blob detection with automatic scale selection 11

5.1 Analysisofscale-spacemaximaforidealizedmodelpatterns...... 11

5.2 Comparisonswithﬁxed-scaleblobdetection............... 14

5.3 Applicationsofblobdetectionwithautomaticscaleselection ..... 15

6 Junction detection with automatic scale selection 15

6.1 Selection of detection scales from normalized scale-space maxima . . . 16

6.2 Analysisofscale-spacemaximafordiﬀusejunctionmodels....... 18

6.3 Experiments:Scale-spacesignaturesinjunctiondetection ....... 19

7 Feature lo calization with automatic scale selection 21

7.1 Cornerlocalizationbylocalconsistency ................. 21

7.2 Automaticselectionoflocalizationscales................. 23

7.3 Experiments:Choiceoflocalizationscale................. 25

7.4 Composedschemeforjunctiondetectionandlocalization ....... 27

7.5 Furtherexperiments ............................ 28

7.6 Applications of corner detection with automatic scale selection . . . . 32

7.7 Extensionsofthejunctiondetectionmethod............... 32

7.8 Extensionstoedgedetection ....................... 33

8 Dense frequency estimation 34

9 Analysis and interpretation of normalized derivatives 38

9.1 Interpretation of γ-normalized derivatives in terms of L

-norms.... 38

9.2 Interpretationintermsofself-similarFourierspectrum......... 38

9.3 Relationstopreviouswork......................... 40

10 Summary and discussion 40

10.1Technicalcontributions .......................... 41

A Appendix 42

A.1 Necessity of the form of the γ-parameterized derivative operator . . . 42

A.2 L

-normalization interpretation of γ-normalizedderivatives ...... 44

A.3 Normalizedderivativeresponsestoself-similarpowerspectra ..... 44

A.4 Discreteimplementationofthescaleselectionmechanisms....... 45

Feature detection with automatic scale selection 1

1 Introduction

One of the very fundamental problems that arises when analysing real-world mea-

surement data originates from the fact that objects in the world may appear in

diﬀerent ways depending upon the scale of observation. This fact is well-known in

physics, where phenomena are modelled at several levels of scale, ranging from parti-

cle physics and quantum mechanics at ﬁne scales, through thermodynamics and solid

mechanics dealing with every-day phenomena, to astronomy and relativity theory at

scales much larger than those we are usually dealing with. Notably, the type of phys-

ical description that is obtained may be strongly dependent on the scale at which

the world is modelled, and this is in clear contrast to certain idealized mathematical

entities, such as “point” or “line”, which are independent of the scale of observation.

In certain controlled situations, appropriate scales for analysis may be known a

priori. For example, a desirable property of a good physicist is his intuitive ability

to select appropriate scales to model a given situation. Under other circumstances,

however, it may not be obvious at all to determine in advance what are the proper

scales. One such example is a vision system with the task of analysing unknown scenes.

Besides the inherent multi-scale properties of real world objects (which, in general,

are unknown), such a system has to face the problems that the perspective mapping

gives rise to size variations, that noise is introduced in the image formation process,

and that the available data are two-dimensional data sets reﬂecting only indirect

properties of a three-dimensional world. To be able to cope with these problems, an

image representation that explicitly incorporates the notion of scale is a crucially

important tool whenever we attempt to interpret sensory data, such as images, by

automatic methods.

In computer vision and image processing, these insights have lead to the con-

struction of multi-scale representations of image data, obtained by embedding any

given signal into a one-parameter family of derived signals (Burt 1981; Crowley

1981; Witkin 1983; Koenderink 1984; Yuille and Poggio 1986; Florack et al. 1992; Lin-

deberg 1994d; Haar Romeny 1994). This family should be parameterized by a scale

parameter and be generated in such a way that ﬁne-scale structures are successively

suppressed when the scale parameter is increased. A main intention behind this con-

struction is to obtain a separation of the image structures in the original image,

such that ﬁne scale image structures only exist at the ﬁnest scales in the multi-scale

representation. Thereby, the task of operating on the image data will be simpliﬁed,

provided that the operations are performed at suﬃciently coarse scales where unnec-

essary and irrelevant ﬁne-scale structures have been suppressed. Empirically, this idea

has proved to be extremely useful, and multi-scale representations such as pyramids,

scale-space representation and non-linear diﬀusion methods are commonly used as

preprocessing steps to a large number of early visual operations, including feature

detection, stereo matching, optic ﬂow, and the computation of shape cues.

A multi-scale representation by itself, however, contains no explicit information

about what image structures should be regarded as signiﬁcant or what scales are ap-

propriate for treating those. Hence, unless early judgements can be made about what

image structures should be regarded as important, we obtain a substantial expansion

of the amount of data to be interpreted by later stage processes. In most previous

works, this problem has been handled by formulating algorithms which rely on the

information present in the data at a small set of manually chosen scales (or even a

single scale). Alternatively, coarse-to-ﬁne algorithms have been expressed, which start

at a given coarse scale and propagate down to a given ﬁner scale. Determining such

2 Lindeberg

scales in advance, however, leads to the introduction of free parameters. If one aims at

autonomous algorithms which are to operate in a complex environment without need

for external parameter tuning, we therefore argue that it is essential to complement

traditional multi-scale processing by explicit mechanisms for scale selection. Notably,

image descriptors can be highly unstable if computed at inappropriately chosen scales,

whereas a proper tuning of the scale parameter can improve the quality of an image

descriptor substantially. As will be demonstrated later, local scale information can

also constitute an important to in its own right.

Early work addressing this problem was presented in (Lindeberg 1991, 1993a)

for blob-like image structures. The basic idea was to study the behaviour of image

structures over scales, and to measure the saliency of image structures from the

stability properties and the lifetime of these structures in scale-space. Scale levels

were selected from the scales at which a measure of blob strength assumed local

maxima over scales and signiﬁcant image structures from the stability of the blob

structures in scale-space. Experimentally, it was shown that this approach could be

used for extracting regions of interest with associated scale levels, which in turn could

serve as to guide various early visual processes.

The subject of this article is to address the problem of automatic scale selection

in a more general setting, for wider classes of image descriptors. We shall be con-

cerned with the problem of extracting image features and computing ﬁlter-like image

descriptors, and present a scale selection principle for image descriptors which can be

expressed in terms of Gaussian derivative ﬁlters. The general idea for scale selection

that will be proposed is to study the evolution properties over scales of normalized dif-

ferential descriptors. Speciﬁcally, it will be suggested that local extrema over scales of

such normalized diﬀerential entities, which arise in this way, are likely to correspond

to interesting image structures. By theoretical considerations and experiments it will

be shown that this approach gives rise to intuitively reasonably results in diﬀerent

situations and that it provides a uniﬁed framework for scale selection for detecting

image features such as blobs, corners, edges and ridges.

1.1 Outline of the presentation

The presentation is organized as follows: Section 2 reviews the main concepts from

scale-space theory we build upon. Section 3 introduces the notion of normalized

derivatives and illustrates how maxima over scales of normalized Gaussian deriva-

tives reﬂect the frequency content in sine wave patterns. This material serves as a

preparation for section 4, which presents the proposed scale selection methodology

and shows how it applies generally to a large class of diﬀerential descriptors. Sec-

tion 4 also proposes a general extension of the common idea of deﬁning features as

zero-crossings of spatial diﬀerential descriptors. If a scale selection mechanism is inte-

grated into such a feature detector, this corresponds to adding another zero-crossing

requirement over the scale dimension in the diﬀerential feature deﬁnition.

Then, section 5 and section 6 show in detail how these ideas can be used for for-

mulating blob detectors and corner detectors with automatic scale selection. Section 8

shows an example of how this approach applies to the computation of dense feature

maps. Section 9 describes diﬀerent ways of interpreting the normalized derivative con-

cept. Finally, section 10 summarizes the main results and ideas of the approach. In a

complementary paper (Lindeberg 1996a) it is developed in detail how this approach

applies to edge detection and ridge detection.

Earlier presentations of diﬀerent parts of this material have appeared elsewhere

(Lindeberg 1993b, 1994a, 1994d, 1996b) as well as applications of the general ideas to

Feature detection with automatic scale selection 3

various problem domains (Lindeberg and G˚arding 1993, 1997; G˚arding and Lindeberg

1996; Lindeberg and Li 1995, 1997; Bretzner and Lindeberg 1998, 1997; Almansa and

Lindeberg 1996; Wiltschi et al. 1997; Lindeberg 1997). The subject of this paper is to

present a coherent description of the proposed scale selection methodology in journal

form, including the developments and reﬁnements that have been performed since the

earliest presented manuscripts.

2 Scale-space representation: Review

Given any continuous signals f : R

→ R, the (linear) scale-space representation

L: R

× R

→ R of f is deﬁned as the solution to the diﬀusion equation

∂

L =

∇

L =

i=1

∂

L (1)

with initial condition L(·;0)=f(·). Equivalently, this family can be deﬁned by

convolution with Gaussian kernels of various width t

L(·; t)=g(·; t)∗f(·), (2)

where g : R

× R

→ R is given by

g(x; t)=

(2πt)

N/2

−(x

+...+x

)/(2t)

, (3)

and x =(x

, ..., x

)

. There are several mathematical results (Koenderink 1984;

Babaud et al. 1986; Yuille and Poggio 1986; Lindeberg 1990, 1994d, 1994b; Koen-

derink and van Doorn 1990, 1992; Florack et al. 1992; Florack 1993; Florack et al.

1994; Pauwels et al. 1995) stating that within the class of linear transformations the

Gaussian kernel is the unique kernel for generating a scale-space. The conditions that

specify the uniqueness are essentially linearity and shift invariance combined with

diﬀerent ways of formalizing the notion that new structures should not be created in

the transformation from a ﬁner to a coarser scale.

Interestingly, the results from these theoretical considerations are in qualitative

agreement with the results of biological evolution. Neurophysiological studies by

(Young 1985, 1987) have shown that there are receptive ﬁelds in the mammalian

retina and visual cortex, whose measured response proﬁles can be well modelled by

Gaussian derivatives up to order four. In these respects, the scale-space representa-

tion with its associated Gaussian derivative operators (where α denotes the order of

diﬀerentiation)

(·; t)=(∂

L)(·, ·; t)=∂

(g∗f)=(∂

g)∗f=g∗(∂

f), (4)

can be seen as a canonical idealized model of a visual front-end. It is for this multi-

scale representation concept we will develop the scale selection methodology.

3 Normalized derivatives and intuitive idea for scale selection

A well-known property of the scale-space representation is that the amplitude of

spatial derivatives

(·; t)=∂

L(·; t)=∂

... ∂

L(·; t)(5)

HTML Viewer

Feature Detection with Automatic Scale Selection

Summary (9 min read)

1 Introduction

1.1 Outline of the presentation

2 Scale-space representation: Review

3 Normalized derivatives and intuitive idea for scale selection

4 Proposed methodology for scale selection

4.1 General scaling property of local maxima over scales

4.2 The scale selection mechanism in practice

4.3 Experiments: Scale-space signatures from real data

4.4 Simultaneous detection of interesting points and scales

5 Blob detection with automatic scale selection

5.1 Analysis of scale-space maxima for idealized model patterns

5.2 Comparisons with fixed-scale blob detection

5.3 Applications of blob detection with automatic scale selection

6.1 Selection of detection scales from normalized scale-space maxima

6.2 Analysis of scale-space maxima for diffuse junction models

6.3 Experiments: Scale-space signatures in junction detection

7 Feature localization with automatic scale selection

7.1 Corner localization by local consistency

7.2 Automatic selection of localization scales

7.3 Experiments: Choice of localization scale

7.4 Composed scheme for junction detection and localization

7.5 Further experiments

200 strongest junctions 200 strongest junctions

7.6 Applications of corner detection with automatic scale selection

7.7 Extensions of the junction detection method

7.8 Extensions to edge detection

8 Dense frequency estimation

9.3 Relations to previous work

10 Summary and discussion

10.1 Technical contributions

Figures (26)

Citations

Cites background or methods from "Feature Detection with Automatic Sc..."

Cites background or methods from "Feature Detection with Automatic Sc..."

Cites background from "Feature Detection with Automatic Sc..."

Cites background from "Feature Detection with Automatic Sc..."

References

"Feature Detection with Automatic Sc..." refers background or methods in this paper

"Feature Detection with Automatic Sc..." refers background in this paper

Related Papers (5)