Speeded-Up Robust Features (SURF)
Summary (5 min read)
1. Introduction
- The task of finding point correspondences between two images of the same scene or object is part of many computer vision applications.
- It has been their goal to develop both a detector and descriptor that, in comparison to the state-of-the-art, are fast to compute while not sacrificing performance.
- Skew, anisotropic scaling, and perspective effects are assumed to be second-order effects, that are covered to some degree by the overall robustness of the descriptor.
- In section 3, the authors describe the strategy applied for fast and robust interest point detection.
- Both applications highlight SURF's benefits in terms of speed and robustness as opposed to other strategies.
2.1. Interest Point Detection
- The most widely used detector is probably the Harris corner detector [15] , proposed back in 1988.
- Mikolajczyk and Schmid [26] refined this method, creating robust and scale-invariant feature detectors with high repeatability, which they coined Harris-Laplace and Hessian-Laplace.
- They used a (scale-adapted) Harris measure or the determinant of the Hessian matrix to select the location, and the Laplacian to select the scale.
- They seem less amenable to acceleration though.
- These fall outside the scope of this article.
2.2. Interest Point Description
- An even larger variety of feature descriptors has been proposed, like Gaussian derivatives [11] , moment invariants [32] , complex features [1, 36] , steerable filters [12] , phasebased local features [6] , and descriptors representing the distribution of smaller-scale features within the interest point neighbourhood.
- The latter, introduced by Lothe authors [24] , have been shown to outperform the others [28] .
- In the same paper [30] , the authors proposed a variant of SIFT, called GLOH, which proved to be even more distinctive with the same number of dimensions.
- It is distinctive and relatively fast, which is crucial for on-line applications.
- Together with the descriptor's low dimensionality, any matching algorithm is bound to perform faster.
3. Interest Point Detection
- The authors approach for interest point detection uses a very basic Hessian-matrix approximation.
- This lends itself to the use of integral images as made popular by Viola and Jones [41] , which reduces the computation time drastically.
3.1. Integral Images
- In order to make the article more self-contained, the authors briefly discuss the concept of integral images.
- They allow for fast computation of box type convolution filters.
- Once the integral image has been computed, it takes three additions to calculate the sum of the intensities over any upright, rectangular area .
- Hence, the calculation time is independent of its size.
3.2. Hessian Matrix Based Interest Points
- The authors base their detector on the Hessian matrix because of its good performance in accuracy.
- As shown in the results section and figure 3 , the performance is comparable or better than with the discretised and cropped Gaussians.
- The relative weight w of the filter responses is used to balance the expression for the Hessian's determinant.
- Notice that for theoretical correctness, the weighting changes depending on the scale.
- The approximated determinant of the Hessian represents the blob response in the image at location x.
3.3. Scale Space Representation
- Interest points need to be found at different scales, not least because the search of correspondences often requires their comparison in images where they are seen at different scales.
- The images are repeatedly smoothed with a Gaussian and then sub-sampled in order to achieve a higher level of the pyramid.
- Therefore, the scale space is analysed by up-scaling the filter size rather than iteratively reducing the image size, figure 4 .
- In total, an octave encompasses a scaling factor of 2 (which implies that one needs to more than double the filter size, see below).
- At the same time, the sampling intervals for the extraction of the interest points can be doubled as well for every new octave.
3.4. Interest Point Localisation
- Specifically, the authors use a fast variant introduced by Neubeck and Van Gool [33] .
- The maxima of the determinant of the Hessian matrix are then interpolated in scale and image space with the method proposed by Brown et al. [5] .
- Scale space interpolation is especially important in their case, as the difference in scale between the first layers of every octave is relatively large.
- Figure 8 shows an example of the detected interest points using their 'Fast-Hessian' detector.
4. Interest Point Description and Matching
- The authors descriptor describes the distribution of the intensity content within the interest point neighbourhood, similar to the gradient information extracted by SIFT [24] and its variants.
- The authors build on the distribution of first order Haar wavelet responses in x and y direction rather than the gradient, exploit integral images for speed, and use only 64 dimensions.
- The authors refer to their detector-descriptor scheme as SURF -Speeded-Up Robust Features.
- The first step consists of fixing a reproducible orientation based on information from a circular region around the interest point.
- These three steps are explained in the following.
4.1. Orientation Assignment
- For that purpose, the authors first calculate the Haar wavelet responses in x and y direction within a circular neighbourhood of radius 6s around the interest point, with s the scale at which the interest point was detected.
- In keeping with the rest, also the size of the wavelets are scale dependent and set to a side length of 4s.
- Therefore, the authors can again use integral images for fast filtering.
- The two summed responses then yield a local orientation vector.
- Small sizes fire on single dominating gradients, large sizes tend to yield maxima in vector length that are not outspoken.
4.2. Descriptor based on Sum of Haar Wavelet Responses
- For the extraction of the descriptor, the first step consists of constructing a square region centred around the interest point and oriented along the orientation selected in the previous section.
- "Horizontal" and "vertical" here is defined in relation to the selected interest point orientation .
- The authors then varied the number of sample points and sub-regions.
- The 4 × 4 sub-region division solution provided the best results, see also section 5.
- On the other hand, the short descriptor with 3 × 3 subregions (SURF-36) performs slightly worse, but allows for very fast matching and is still acceptable in comparison to other descriptors in the literature.
4.3. Fast Indexing for Matching
- For fast indexing during the matching stage, the sign of the Laplacian (i.e. the trace of the Hessian matrix) for the underlying interest point is included.
- Typically, the interest points are found at blob-type structures.
- The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the reverse situation.
- This feature is available at no extra computational cost as it was already computed during the detection phase.
- Note that this is also of advantage for more advanced indexing methods.
5. Results
- The following presents both simulated as well as realworld results.
- First, the authors evaluate the effect of some parameter settings and show the overall performance of their detector and descriptor based on a standard evaluation set.
- (First image of Graffiti scene, 800 × 640) Then, the authors describe two possible applications.
- Taking this application to image registration a bit further, the authors focus in this article on the more difficult problem of camera calibration and 3D reconstruction, also in wide-baseline cases.
- SURF manages to calibrate the cameras even in challenging cases reliably and accurately.
5.1. Experimental Evaluation and Parameter Settings
- The authors tested their detector using the image sequences and testing software provided by Mikolajczyk 2 .
- The evaluation criterion is the repeatability score.
- The test sequences comprise images of real textured and structured scenes.
- There are different types of geometric and photometric transformations, like changing viewpoints, zoom and rotation, image blur, lighting changes and JPEG compression.
5.1.1. SURF Detector
- The authors tested two versions of their Fast-Hessian detector, depending on the initial Gaussian derivative filter size.
- The thresholds were adapted according to the number of interest points found with the DoG detector.
- The FH-15 detector is more than three times faster than DoG and more than four times faster than Hessian-Laplace (see also table 1 ).
- The repeatability scores for the Graffiti sequence are comparable for all detectors.
- Hence, these deformations have to be accounted for by the overall robustness of the features.
5.1.2. SURF Descriptor
- Here, the authors focus on two options offered by the SURF descriptor and their effect on recall/precision.
- Firstly, the number of divisions of the square grid in figure 12, and hence the descriptor size, has a major impact on the matching speed.
- Secondly, the authors consider the extended descriptor as described above.
- Overall, the effect of the extended version is minimal.
- Here, the authors only show a comparison with two other prominent description schemes (SIFT [24] and GLOH [30] ), again averaged over the test sequences . SURF-64 turns out to perform best.
5.2. Application to 3D
- The authors evaluate the accuracy of their Fast-Hessian detector for the application of camera selfcalibration and 3D reconstruction.
- The first evaluation compares different state-of-the-art interest point detectors for the two-view case.
- The second evaluation considers the N -view case for camera self-calibration and dense 3D reconstruction from multiple images, some taken under wide-baseline conditions.
5.2.1. 2-view Case
- In order to evaluate the performance of different interest point detection schemes for camera calibration and 3D reconstruction, the authors created a controlled environment.
- A good scene for such an evaluation are two highly textured planes forming a right angle (measured 88.6 in their case), see figure 20.
- Principal point and aspect ratio are known.
- As the number of correct matches is an important factor for the accuracy, the authors adjusted the interest point detectors' parameters so that after matching, they are left with 800 correct matches (matches not belonging to the angle are filtered).
- Table 2 shows these quantitative results for their two versions of the Fast-Hessian detector (FH-9 and FH-15), the DoG features of SIFT [24] , and the Hessian-and Harris-Laplace detectors proposed by Mikolajczyk and Schmid [29] .
5.2.2. N-view Case
- The SURF detection and description algorithms have been integrated with the Epoch 3D Webservice of the VISICS research group at the K.U. Leuven 3 .
- There, the calibration of the cameras and dense depth maps are computed automatically using these images only [40] .
- The previous procedure using Harris corners and normalised cross correlation of image windows has problems matching such wide-baseline images.
- Furthermore, the DoG detector combined with SIFT description failed on some image sequences, where SURF succeeded to calibrate all the cameras accurately.
- The vase is easily recognisable even in the sparse 3D model.
5.3. Application to Object Recognition
- Bay et al. [3] already demonstrated the usefulness of SURF in a simple object detection task.
- Basis for this was a publicly available implementation of two bag-of-words classifiers [10] .
- While this is a rather simple test set for object recognition in general, it definitely serves the purpose of comparing the performance of the actual descriptors.
- As can be seen, the upright counterparts for both SURF-128 and SURF-64 perform best.
- These positive results indicate that SURF should be very well suited for tasks in object detection, object recognition or image retrieval.
6. Conclusion and Outlook
- The authors presented a fast and performant scale and rotationinvariant interest point detector and descriptor.
- The important speed gain is due to the use of integral images, which drastically reduce the number of operations for simple box convolutions, independent of the chosen scale.
- The high repeatability is advantageous for camera self-calibration, where an accurate interest point detection has a direct impact on the accuracy of the camera self-calibration and therefore on the quality of the resulting 3D model.
- The simplicity and again the use of integral images make their descriptor competitive in terms of speed.
- The latest version of SURF is available for public download.
Did you find this useful? Give us your feedback
Figures (27)
Fig. 8. Detected interest points for a Sunflower field. This kind of scenes shows the nature of the features obtained using Hessian-based detectors. Fig. 10. Orientation assignment: A sliding orientation window of size Fig. 9. Haar wavelet filters to compute the responses in x (left) and y direction (right). The dark parts have the weight −1 and the light parts +1. Fig. 21. Orthogonal projection of the reconstructed angle shown in figure 20. Fig. 22. 3D reconstruction with KU-Leuven’s 3D webservice. Left: One of the 13 input images for the camera calibration. Right: Position of the reconstructed cameras and sparse 3D model of the vase. Table 2 Comparison of different interest point detectors for the application of camera calibration and 3D reconstruction. The true angle is 88.6◦. Fig. 20. Input images for the quantitative detector evaluation. This represents a good scene choice for the comparison of different types of interest point detectors, as its components are simple geometric elements. Fig. 11. Detail of the Graffiti scene showing the size of the oriented descriptor window at different scales. Fig. 14. Due to the global integration of SURF’s descriptor, it stays more robust to various image perturbations than the more locally operating SIFT descriptor. Fig. 12. To build the descriptor, an oriented quadratic grid with 4×4 square sub-regions is laid over the interest point (left). For each square, the wavelet responses are computed. The 2× 2 sub-divisions of each square correspond to the actual fields of the descriptor. These are the sums dx, |dx|, dy, and |dy|, computed relatively to the orientation of the grid (right). Fig. 13. The descriptor entries of a sub-region represent the nature of the underlying intensity pattern. Left: In case of a homogeneous region, all values are relatively low. Middle: In presence of frequencies in x direction, the value of ∑ |dx| is high, but all others remain low. If the intensity is gradually increasing in x direction, both values∑ dx and ∑ |dx| are high. Fig. 19. Recall-precision for nearest neighbor ratio matching for different description schemes, evaluated on SURF keypoints. Figures averaged over 8 image pairs of Mikolajczyk’s database. Fig. 18. Recall-precision for nearest neighbor ratio matching for varying side length of square grid. A maximum is attained for a square of 4×4. Figures averaged over 8 image pairs of Mikolajczyk’s database. Top: standard descriptor, bottom: extended descriptor. Fig. 1. Using integral images, it takes only three additions and four memory accesses to calculate the sum of intensities inside a rectangular region of any size. Fig. 2. Left to right: the (discretised and cropped) Gaussian second order partial derivative in y- (Lyy) and xy-direction (Lxy), respectively; our approximation for the second order Gaussian partial derivative in y- (Dyy) and xy-direction (Dxy). The grey regions are equal to zero. Fig. 25. Comparison of different options for the SURF descriptor for a naive Bayes classifier working on a bag-of-words representation. The descriptor was evaluated on SURF keypoints. Top: standard, bottom: extended descriptor. Table 1 Thresholds, number of detected points and calculation time for the detectors in our comparison. (First image of Graffiti scene, 800× 640) Fig. 15. If the contrast between two interest points is different (dark on light background vs. light on dark background), the candidate is not considered a valuable match. Fig. 4. Instead of iteratively reducing the image size (left), the use of integral images allows the up-scaling of the filter at constant cost (right). Fig. 3. Top: Repeatability score for image rotation of up to 180 degrees. Hessian-based detectors have in general a lower repeatability score for angles around uneven multiples of π 4 . Bottom: Sample images from the Van Gogh sequence that was used. Fast-Hessian is the more accurate version of our detector (FH-15), as explained in section 3.3. Fig. 23. 3D reconstruction with KU-Leuven’s 3D webservice. Top row: The 3 input images of a detail of the San Marco Cathedral in Venice. Middle row: Samples of the textured dense reconstruction. Bottom row: un-textured dense reconstruction. The quality of the dense 3D model directly reflects the quality of the camera calibration. The images were taken by Maurizio Forte, CNR-ITABC, Rome).
Citations
3,558 citations
3,292 citations
2,900 citations
2,282 citations
2,228 citations
Cites methods from "Speeded-Up Robust Features (SURF)"
...We have selected six different feature types: RGB color histograms, SIFT [21], rgSIFT [35], PHOG [4], SURF [2] and local self-similarity histograms [30]....
[...]
References
46,906 citations
"Speeded-Up Robust Features (SURF)" refers background or methods in this paper
...Lowe [24] subtracts these pyramid layers in order to get the DoG (Difference of Gaussians) images where edges and blobs can be found....
[...]
...Here, we only show a comparison with two other prominent description schemes (SIFT [24] and GLOH [30]), again averaged over the test sequences (Fig....
[...]
...bers over multiple images (we chose one pair from each set of test images), the ratio-matching scheme [24] is used....
[...]
...the gradient information extracted by SIFT [24] and its variants....
[...]
...[21,24,27,39,25])....
[...]
18,620 citations
"Speeded-Up Robust Features (SURF)" refers methods in this paper
...This lends itself to the use of integral images as made popular by Viola and Jones [41], which reduces the computation time drastically....
[...]
16,989 citations
"Speeded-Up Robust Features (SURF)" refers methods in this paper
...The DoG detector was kindly provided by David Lowe....
[...]
...Lowe [24] subtracts these pyramid layers in order to get the DoG (Difference of Gaussians) images where edges and blobs can be found....
[...]
...The latter, introduced by Lowe [24], have been shown to outperform the others [28]....
[...]
...Methods include the best-binfirst proposed by Lowe [24], balltrees [35], vocabulary trees [34], locality sensitive hashing [9], or redundant bit vectors [13]....
[...]
...Focusing on speed, Lowe [23] proposed to approximate the Laplacian of Gaussians (LoG) by a Difference of Gaussians (DoG) filter....
[...]
13,993 citations
"Speeded-Up Robust Features (SURF)" refers methods in this paper
...The most widely used detector is probably the Harris corner detector [15], proposed back in 1988....
[...]
13,011 citations