Feature Detection with Automatic Scale Selection
Summary (9 min read)
1 Introduction
- One of the very fundamental problems that arises when analysing real-world measurement data originates from the fact that objects in the world may appear in different ways depending upon the scale of observation.
- Notably, the type of physical description that is obtained may be strongly dependent on the scale at which the world is modelled, and this is in clear contrast to certain idealized mathematical entities, such as “point” or “line”, which are independent of the scale of observation.
- Under other circumstances, however, it may not be obvious at all to determine in advance what are the proper scales.
- A main intention behind this construction is to obtain a separation of the image structures in the original image, such that fine scale image structures only exist at the finest scales in the multi-scale representation.
- The subject of this article is to address the problem of automatic scale selection in a more general setting, for wider classes of image descriptors.
1.1 Outline of the presentation
- The presentation is organized as follows: Section 2 reviews the main concepts from scale-space theory the authors build upon.
- Section 3 introduces the notion of normalized derivatives and illustrates how maxima over scales of normalized Gaussian derivatives reflect the frequency content in sine wave patterns.
- This material serves as a preparation for section 4, which presents the proposed scale selection methodology and shows how it applies generally to a large class of differential descriptors.
- Section 8 shows an example of how this approach applies to the computation of dense feature maps.
- In a complementary paper (Lindeberg 1996a) it is developed in detail how this approach applies to edge detection and ridge detection.
2 Scale-space representation: Review
- There are several mathematical results (Koenderink 1984; Babaud et al.
- Interestingly, the results from these theoretical considerations are in qualitative agreement with the results of biological evolution.
- Neurophysiological studies by (Young 1985, 1987) have shown that there are receptive fields in the mammalian retina and visual cortex, whose measured response profiles can be well modelled by Gaussian derivatives up to order four.
3 Normalized derivatives and intuitive idea for scale selection
- This is a direct consequence of the non-enhancement property of local extrema, which states that the value at a local maximum cannot increase, and the value at a local minimum cannot decrease.
- In practice, it means that the amplitude of the variations in a signal will always decrease with scale.
- In the special case when γ = 1, these ξ-coordinates and their associated normalized derivative operator are dimensionless.
- As the authors shall see later, however, values of γ < 1 will be highly useful when formulating scale selection mechanisms for edge detection and ridge detection.
- For the sinusoidal signal, the amplitude of an mth order normalized derivative as function of scale is then given by Lξm,max(t) = tmγ/2 ωm0 e −ω20t/2, (11) i.e., it first increases and then decreases.
4 Proposed methodology for scale selection
- The example above shows that the scale at which a normalized derivative assumes its maximum over scales is for a sinusoidal signal proportional to the wavelength of the signal.
- Maxima over scales of normalized derivatives reflect the scales over which spatial variations take place in the signal.
- Yhis operation corresponds to an interesting computational structure, since it constitutes a way of estimating length based on local measurements performed at only a single spatial point in the scale-space representation, and without explicitly laying out a ruler.
- Moreover, compared to a local windowed Fourier transform there is no need for making any explicit settings of window size for computing the Fourier transform.
- This principle is closely related to although not equivalent to the method for scale selection in previously proposed in (Lindeberg 1991, 1993a), where interesting scale levels were determined from maxima over scales of a normalized blob measure.
4.1 General scaling property of local maxima over scales
- A basic justification for the abovementioned arguments can be obtained from the fact that for a large class of (possibly non-linear) combinations of normalized derivatives it holds that maxima over scales have a nice behaviour under rescalings of the intensity pattern.
- Let us hence restrict the analysis to polynomial differential invariants which are homogeneous in the sense that the sum of the orders of differentiation is the same for each term in the polynomial.
- For a differential expression of this form, the corresponding normalized differential expression in each domain is given by Dγ−normL = tMγ/2DL, (26) D′γ−normL′ = t′Mγ/2D′L′. (27) From (23) it follows that these normalized differential expressions are related by Dγ−normL = sM(1−γ)D′γ−normL′. (28) Clearly, by γ-normalization with γ = 1, the magnitude of the derivative is not scale invariant.
- Hence, even when γ 6= 1, the authors can achieve sufficient scale invariance to support the proposed scale selection methodology.
- From the the transformation property (23), it is, however, apparent that this magnitude measure will be strongly dependent on the scale at which the maximum over scales is assumed.
4.2 The scale selection mechanism in practice
- So far the authors have proposed a general methodology for scale selection by detecting local maxima in feature responses over scales.
- Here, the authors shall not attempt to answer this question.
- Let us instead contend that the differential expression should at least be determined so as to capture the types of image structures under consideration.
- The general approach to scale selection that will be proposed is to use these maximal responses over scales in the stage of detecting image features, i.e., when establishing the existence of different types of image structures.
- The suggested framework naturally gives rise to two-stage algorithms, with feature detection at coarse scales followed by feature localization at finer scales.
4.3 Experiments: Scale-space signatures from real data
- (To avoid the sensitivity to sign of these entities, and hence the polarity of the signal, traceHnormL and detHnormL have been squared before presentation.).
- These graphs are called the scale-space signatures of 2 and 2, respectively.
- This example illustrates that results in agreement with the proposed scale selection principle can be obtained also for real-world data (and for signals having a much richer frequency content than a single sine wave).
- The reason why these particular differential expressions have been selected here is because they constitute differential entities useful for blob detection; see e.g.
4.4 Simultaneous detection of interesting points and scales
- In figure 2, the signatures of the normalized differential entities were computed at the central point in each image.
- These points were deliberately chosen to coincide 2 with the centers of the sunflowers, where the blob response can be expected to be maximal under spatial perturbations.
- Specific examples of this idea will be worked out in more detail in the following sections.
- Referring to the invariance properties of local maxima over scales under rescalings of the input signal, the authors can observe that they transfer trivially to scale-space maxima.
5 Blob detection with automatic scale selection
- Every scale-space maximum has been graphically illustrated by a circle centered at the point at which the spatial maximum is assumed, and with the size determined such that the radius (measured in pixel units) is proportional to the scale at which the maximum is assumed (measured in dimension length).
- To reduce the number of blobs, a threshold on the maximum normalized response has been selected such that the 250 blobs having the maximum normalized responses according to (30) remain.
- The bottom row shows the result of superimposing these circles onto a bright copy of the original image, as well as corresponding results for the normalized scalespace extrema of the square of the determinant of the Hessian matrix.
- Corresponding experiments for a synthetic pattern (analysed in section 5.1) are given in figure 4.
- Observe how these conceptually very simple differential geometric descriptors give a very reasonable description of the blob-like structures in the image (in particular concerning the blob size) considering how little information is used in the processing.
5.1 Analysis of scale-space maxima for idealized model patterns
- Whereas the theoretical analysis in section 4.1 applies generally to large classes of differential invariants and input signals, one may ask how the scale selection method for blob detection performs in specific situations.
- The authors shall study two 3When detecting scale-space maxima in practice, there is, of course, no need to explicitly track the extrema along the extremum path in scale-space.
- Model patterns for which a closed-form solution of diffusion equation can be calculated and a complete analytical study hence is feasible.
- There is a unique solution when the ratio ω2/ω1 is close to one, and three solutions when the ratio is sufficiently large.
5.2 Comparisons with fixed-scale blob detection
- Figure 6 shows the result of computing spatial maxima at different scales in the response of the Laplacian operator from the sine wave pattern in figure 4.
- At each scale, the 50 strongest responses have been extracted.
- As can be seen, small blobs are given the highest relative ranking at fine scales, whereas large blobs are given the highest relative ranking at coarse scales.
- Hence, a blob detector of this type (operating at a single predetermined scale) induces a bias towards image structures of a certain size.
- (As was shown above, the associated measure of blob strength is strictly scale invariant.).
5.3 Applications of blob detection with automatic scale selection
- Following the previously presented arguments, the authors argue that a scale selection mechanism is an essential complement to any blob detector aimed at handling large size variations in the image structures.
- In addition, scale information associated with such adaptively computed image descriptors may serve as an important cue in its own right.
- In (Bretzner and Lindeberg 1996, 1998) an application to feature tracking is presented, where (i) the scale information constitutes a key component in the criterion for matching image features over time, and (ii) the scale selection mechanism is essential for the vision system to be able to capture objects under large size variations over time.
6.1 Selection of detection scales from normalized scale-space maxima
- A commonly used entity for junction detection is the curvature of level curves in intensity data multiplied by the gradient magnitude (Kitchen and Rosenfeld 1982; Dreschler and Nagel 1982; Koenderink and Richards 1988; Noble 1988; Deriche and Giraudon 1990; Blom 1992; Florack et al. 1992; Lindeberg 1994d).
- To reduce the number of junction candidates, the scale-space maxima have been sorted with respect to a saliency measure.
- Finally, the 50 most significant blobs according to this ranking have been displayed.
- Of course, thresholding on the magnitude of the operator response constitutes a coarse selective mechanism for feature detection.
- Nevertheless, note that this operation gives rise to a set of junction candidates with reasonable interpretations in the scene.
6.2 Analysis of scale-space maxima for diffuse junction models
- To obtain an intuitive understanding of the qualitative behaviour of the scale selection method in this case, let us analyse a simple junction model for which a closed-form analysis can be carried out without too much effort.
- Unfortunately, the equation that determines the position of the spatial maximum in κ̃2 over scales is non-trivial to handle (it contains a non-linear combination of the Gaussian function, the primitive function of the Gaussian, and polynomials).
- This function can be regarded as a coarse model of the behaviour at so coarse scales in scale-space that the shape distortions are substantial and the overall shape of a finite-size object is severely affected.
- Hence, selecting scale levels (and spatial points) where κ̃2norm assumes maxima over scales can be expected to give rise to scale levels in the intermediate scale range (where a finite extent junction model constitutes a reasonable approximation).
6.3 Experiments: Scale-space signatures in junction detection
- Figure 9 illustrates these effects for synthetic L-junctions with varying degrees of diffuseness.
- In other words, the scale at which the maximum over scales is assumed indicates the spatial extent (the size) of the region for which a junction model is consistent with the grey-level data (in agreement with the suggested scale selection principle).
- It shows scale-space maxima of κ̃2norm computed from a synthetic image containing corner structures at different scales.
- The original greylevel image is shown in the ground plane, and each scale-space maximum has been graphically visualized by a sphere centered at the position (x0; t0) in scale-space at which the scale-space maximum was assumed.
- More results on corner detection, including a complementary mechanism for accurate corner localization, are presented in section 7.
7 Feature localization with automatic scale selection
- The scale selection methodology presented so far applies to the detection of image features, and the role of the scale selection mechanism is to estimate the approximate size of the image structures the feature detector responds to.
- Whereas this approach provides a conceptually simple way to express various feature detectors, such as a junction detector, which automatically adapts its scale levels to the local image structure, it is not guaranteed that the spatial positions of the scale-space maxima constitute accurate estimates of the corner locations.
- The local maxima over scales may be assumed at rather coarse scales, where the drift due to scale-space smoothing is substantial and adjacent features may interfere with each other.
- For this reason, it is natural to complement the initial feature detection step by an explicit feature localization stage.
- The subject of this section is show how mechanism for automatic scale selection can be formulated in this context, by minimizing normalized measures of inconsistency over scales.
7.1 Corner localization by local consistency
- Minimizing this expression corresponds to finding the point x that minimizes the weighted integral of the squares of the distances from x to all lx′ in the neighbourhood, see figure 12.
- (Dx′(x) is distance from x to lx′ multiplied by the gradient magnitude, and the window function implies that stronger weights are given to points in a neighbourhood of x0.).
- The overall intention of this formulation is that for an image pattern containing a junction, the point x that minimizes (57) should constitute a better estimate of the projection of the physical junction than x0.
- Explicit solution in terms of local image statistics.
7.2 Automatic selection of localization scales
- The formulation in previous section however, leaves two major problems open: Moreover, let the scale value of this window function be proportional to the detection scale t0 at which the maximum over scales in κ̃2norm was assumed.
- Specifically, scale selection according by minimizing the normalized residual r̃ (65) over scales, corresponds to selecting the scale that minimizes the estimated inaccuracy in the localization estimate.
- Thus, when smoothing is necessary, the residual will decrease.
- This can be easily understood by observing that for an ideal polygon-type junction (consisting of regions of uniform grey-level delimited by straight edges), all edge tangents meet at the junction point, which means that the residual d̃min is exactly zero.
7.3 Experiments: Choice of localization scale
- Figure 13 and figure 14 show the result of applying this scale selection mechanism to a sharp and a diffuse corner with different amounts of added white Gaussian noise.
- As can be seen, the results agree with a the qualitative discussion above.
- For each noise level, this table gives the scale at which the normalized residual assumes its minimum over scales, as well as the scale at which the estimate with the minimum absolute error over scales is obtained.
- The results show that the normalized residual serves as an estimate of the inaccuracy in the corner localization estimate, and specifically that the scale at which the minimum over scales in d̃min is assumed is a reasonable estimate of the scale at which the authors have the localization estimate with the minimum absolute error.
- Figure 15 shows the result of applying the composed junction localization stage to the junction candidates in figure 8.
7.4 Composed scheme for junction detection and localization
- To summarize, the composed two-stage scheme for junction detection and junction localization consists of the following processing steps:5 1. Detection.
- Detect scale-space maxima in the square of the normalized rescaled level curve curvature κ̃norm = t2γ κ̃ = t2γ (L2x2Lx1x1 − 2Lx1Lx2Lx1x2 + L2x1Lx2x2) (or some other suitable normalized differential entity).
- This generates a set of junction candidates.
- 5Besides the general descriptions given in previous sections.
7.5 Further experiments
- Concerning the number of junction candidates to be processed and passed on to later processing stages, the authors have not made any attempts in this work to decide automatically how many of the extracted junction candidates correspond to physical junctions in the world.
- The authors argue that such decisions require integration with higherlevel reasoning and verification processes, and may be extremely hard to make at the earliest processing stages unless additional information is available about the external conditions.
- For this reason, this module only aims at computing an early ranking of image features in order of significance, which can be used by a vision system for processing features in decreasing order of significance.
- 6An integrated vision system for analysing junctions by actively zooming in to interesting structures is presented in (Brunnström et al. 1992; Lindeberg 1993a).
200 strongest junctions 200 strongest junctions
- In line with this idea, the results are shown in terms of the N strongest junction candidates for different (manually chosen) values of N .
- In figure 17, which shows corresponding examples for more cluttered scenes, the number of junctions displayed has been increased to 100 and 200.
- Notably, this number of junction candidates constitutes the only essential tuning parameter of the composed algorithm.
- Here, the 10 most significant junctions have been processed.
- The table in figure 19 shows numerical values exemplifying how large the localization errors can be in the different processing stages.
7.6 Applications of corner detection with automatic scale selection
- In (Lindeberg and Li 1995, 1997) it is shown how the support region associated with each junction allows for conceptually simple matching between junctions and edges based on spatial overlap only and without any need for providing externally determined thresholds on e.g. distance.
- Then, the matching relations between edges and junction cues that arise in this way are used in a pre-processing stage for classifying edges into straight and curved.
- In (Bretzner and Lindeberg 1996) it is demonstrated how these support regions can be used for simplifying matching of junctions over time in tracking algorithms.
- Specifically, it is shown that the scale selection mechanism in the junction detector is essential to capture junctions that undergo large size changes.
- In (Lindeberg 1995a, 1996d) a scale selection principle for stereo matching and flow estimation is presented, which also involves the extension of a fixed scale least squares estimation problem to optimization over multiple scales.
7.7 Extensions of the junction detection method
- The main purpose of the presentation in this section has been to make explicit how a scale selection mechanism can be incorporated into a junction detector.
- When building a stand-alone junction detector, there are a few additional mechanisms which are natural to include if the aim is to construct a stand-alone junction detector.
- Concerning the ranking on significance, the authors can conceive linking the maxima of the junction responses across scales in a similar way as done in the scale-space primal sketch (Lindeberg 1993a), register scale-space events such as bifurcations, and include the scale-space lifetime of each junction response into the significance measure.
- Concerning the region of interest associated with each junction candidate, the authors have throughout this work represented the support region of a scale-space maximum by a circle with area reflecting the detection scale.
- A possible limitation of this approach is that nearby junctions may lead to interference effects in operations such as the localization stage.
7.8 Extensions to edge detection
- Concerning more general applications of the proposed methodology, it should be noted that the scale selection method for junction localization applies to edge detection as well.
- The columns show from left to right; (i) the local grey-level pattern, (ii) the signature of d̃min computed at the central point, and (iii) edges detected at the scale td = argmin d̃min at which the minimum over scales in d̃min was assumed.
- In the first row, the authors can see that when performing edge detection at argmin d̃min they obtain coherent edge descriptor corresponding to the dominant edge structure in this region.
- In the second row, a large amount of white Gaussian noise has been added to the grey-level image, and the minimum over scales is assumed at a much coarser scale.
- Concerning these experiments it should be pointed out that they are mainly intended to demonstrate the potential in applying the proposed method for selecting localization scales to the problem of edge detection, and that further processing steps are needed to give a complete algorithm.
8 Dense frequency estimation
- So far, the authors have seen how the scale selection methodology can be applied to the detection of sparse feature points.
- An obvious problem that arises if the authors would base a scale selection mechanism for computing dense image descriptors on a partial derivative of the intensity function, such as the Laplacian operator is that there would be large spatial variations in the operator response.
- A common methodology in signal processing for reducing this so-called phase dependency is by using quadrature filter pairs defined (from a Hilbert transform) in such a way that the Euclidean sum of the filter responses will be constant for any sine wave.
- (As will be shown below, this scale value is of the same order of magnitude as the scales that maximize QL over scales; compare also with section 3.).
- In the abovementioned sources, these specific entities and normalization parameters are shown to be useful for edge detection and ridge detection with automatic scale selection.
9.3 Relations to previous work
- Such L1-normalized kernels of first order have been used, for example, in edge detection and edge classification by (Korn 1988), (Mallat and Zhong 1992), and (Zhang and Bergholm 1993), and in pyramids by (Crowley and Parker 1984).
- More generally, evolution properties across scales of wavelet transforms have been used by (Mallat and Hwang 1992) for characterizing local Lipshitz exponents of singularities.
- There is also a connection to the “top point” representation proposed by (Johansen et al. 1986) in the sense that the points in the scale-space at which bifurcations occur serve as to delimit extremum paths with different topology.
- A main difference between the scale selection mechanism suggested here and the work in (Lindeberg 1991) and (Mallat and Hwang 1992), however, is that here it is shown how these notions can be applied to large classes of non-linear differential invariants computed in a scale-space representation.
- Moreover, feature detection algorithms have been formulated with integrated scale selection mechanisms and it has been shown how different derivative normalization approaches lead to different classes of differential expressions for which the scale selection mechanism commutes with rescalings of the input pattern.
10 Summary and discussion
- The authors have argued that the subject of scale selection is essential to many problems in computer vision and automated image analysis.
- A general scale selection principle has been presented stating that in the absence of other evidence, coarse estimates of the size of image structures can be computed from the scales at which normalized differential geometric descriptors assume maxima over scales.
- Adapted coarse scales, and then localized to finer scales in a second stage processing stage.
- Whereas the general advantages of such a two-stage approach to feature detection are well-known in the literature, a major contribution here is that explicit mechanisms are provided for automatic selection of the detection scales as well as the localization scales.
- Moreover, these processing stages are integrated into algorithms which are essentially free from other tuning parameters that the number of features of interest.
10.1 Technical contributions
- At a technically more detailed level some of the main contributions are that: .
- It is emphasized how the evolution properties over scales of normalized scalespace derivatives differ from those of traditional spatial derivatives.
- A general scale selection principle for scale selection has been proposed stating that extrema over scales in the signature of normalized differential entities are useful in the stage of detecting image features.
- The problem of junction detection is treated more extensively, and the resulting method is the first junction detection algorithm with automatic scale selection.
- Specifically, it is shown how localization scales can be selected automatically by minimizing a certain normalized residual across scales.
Did you find this useful? Give us your feedback
Citations
13,011 citations
Cites background or methods from "Feature Detection with Automatic Sc..."
...The detector is based on the Hessian matrix [11, 1], but uses a very basic approximation, just as DoG [2] is a very basic Laplacian-based detector....
[...]
...Lindeberg introduced the concept of automatic scale selection [1]....
[...]
12,449 citations
Cites background or methods from "Feature Detection with Automatic Sc..."
...In contrast to the Hessian-Laplace detector by Mikolajczyk and Schmid [26], we rely on the determinant of the Hessian also for the scale selection, as done by Lindeberg [21]....
[...]
...of the Hessian also for the scale selection, as done by Lindeberg [21]....
[...]
...Lindeberg [21] introduced the concept of automatic scale selection....
[...]
7,057 citations
Cites background from "Feature Detection with Automatic Sc..."
...Lindeberg [ 23 ] has developed a scale-invariant iblobi detector, where a iblobi is dened by...
[...]
5,068 citations
4,146 citations
Cites background from "Feature Detection with Automatic Sc..."
...They compare a number of feature detectors (Harris-Laplace (Mikolajczyk and Schmid 2004) and Laplacian (Lindeberg 1998b)), descriptors (SIFT, RIFT, and SPIN (Lazebnik et al....
[...]
References
4,064 citations
"Feature Detection with Automatic Sc..." refers background or methods in this paper
...A main difference between the scale selection mechanism suggested here and the work in (Lindeberg, 1991; Mallat and Hwang, 1992), however, here it is shown that how these notions can be applied to large classes of non-linear differential invariants computed in a scale-space representation....
[...]
...An analysis of scale-space like responses to sine waves corresponding to the case when γ = 1 in this section has also been performed in wavelet analysis by (Mallat and Hwang, 1992); see Section 9....
[...]
3,187 citations
3,077 citations
"Feature Detection with Automatic Sc..." refers background in this paper
...It is well-known that natural images often show a qualitative behaviour similar to this ( Field, 1987 )....
[...]
3,008 citations