# SURF: speeded up robust features

## Summary (2 min read)

### 1 Introduction

- The task of finding correspondences between two images of the same scene or object is part of many computer vision applications.
- It has been their goal to develop both a detector and descriptor, which in comparison to the state-of-the-art are faster to compute, while not sacrificing performance.
- Concerning the photometric deformations, the authors assume a simple linear model with a scale factor and offset.
- Section 2 describes related work, on which their results are founded.

### 3 Fast-Hessian Detector

- The authors base their detector on the Hessian matrix because of its good performance in computation time and accuracy.
- Therefore, the scale space is analysed by up-scaling the filter size rather than iteratively reducing the image size.
- At larger scales, the step between consecutive filter sizes should also scale accordingly.
- As the ratios of their filter layout remain constant after scaling, the approximated Gaussian derivatives scale accordingly.
- Fig. 2 (left) shows an example of the detected interest points using their ’Fast-Hessian’ detector.

### 4 SURF Descriptor

- The good performance of SIFT compared to other descriptors [8] is remarkable.
- Its mixing of crudely localised information and the distribution of gradient related features seems to yield good distinctive power while fending off the effects of localisation errors in terms of scale or space.
- The proposed SURF descriptor is based on similar properties, with a complexity stripped down even further.
- The first step consists of fixing a reproducible orientation based on information from a circular region around the interest point.
- These two steps are now explained in turn.

### 4.1 Orientation Assignment

- For that purpose, the authors first calculate the Haar-wavelet responses in x and y direction, shown in Fig. 2, and this in a circular neighbourhood of radius 6s around the interest point, with s the scale at which the interest point was detected.
- Therefore, the authors use again integral images for fast filtering.
- The horizontal and vertical responses within the window are summed.
- The longest such vector lends its orientation to the interest point.
- Small sizes fire on single dominating wavelet responses, large sizes yield maxima in vector length that are not outspoken.

### 4.2 Descriptor Components

- For the extraction of the descriptor, the first step consists of constructing a square region centered around the interest point, and oriented along the orientation selected in the previous section.
- The wavelet responses are invariant to a bias in illumination .
- The extended descriptor for 4 × 4 subregions (SURF-128) comes out to perform best.
- Hence, this minimal information allows for faster matching and gives a slight increase in performance.

### 5 Experimental Results

- First, the authors present results on a standard evaluation set, fot both the detector and the descriptor.
- For the detector comparison, the authors selected the two viewpoint changes (Graffiti and Wall), one zoom and rotation (Boat) and lighting changes (see Fig. 6, discussed below).
- The SURF descriptor outperforms the other descriptors in a systematic and significant way, with sometimes more than 10% improvement in recall for the same level of precision.
- The timings were evaluated on a standard Linux PC (Pentium IV, 3GHz).
- The object shown on the reference image with the highest number of matches with respect to the test image is chosen as the recognised object.

### 6 Conclusion

- The authors have presented a fast and performant interest point detection-description scheme which outperforms the current state-of-the art, both in speed and accuracy.
- The descriptor is easily extendable for the description of affine invariant regions.
- The authors gratefully acknowledge the support from Swiss SNF NCCR project IM2, Toyota-TME and the Flemish Fund for Scientific Research.

Did you find this useful? Give us your feedback

##### Citations

12,449 citations

^{1}

8,702 citations

### Cites background or methods from "SURF: speeded up robust features"

...This has led to an intensive search for replacements with lower computation cost; arguably the best of these is SURF [2]....

[...]

...There are various ways to describe the orientation of a keypoint; many of these involve histograms of gradient computations, for example in SIFT [17] and the approximation by block patterns in SURF [2]....

[...]

4,522 citations

3,807 citations

3,760 citations

##### References

46,906 citations

^{1}

18,620 citations

### "SURF: speeded up robust features" refers background in this paper

...In order to make the paper more self-contained, we succinctly discuss the concept of integral images, as defined by [23]....

[...]

16,989 citations

### "SURF: speeded up robust features" refers methods in this paper

...Focusing on speed, Lowe [12] approximated the Laplacian of Gaussian (LoG) by a Difference of Gaussians (DoG) filter....

[...]

13,993 citations

7,057 citations

##### Related Papers (5)

##### Frequently Asked Questions (18)

###### Q2. What are the future works in "Surf: speeded up robust features" ?

Future work will aim at optimising the code for additional speed up.

###### Q3. What is the benefit of avoiding the overkill of rotation invariance in such cases?

The benefit of avoiding the overkill of rotation invariance in such cases is not only increased speed, but also increased discriminative power.

###### Q4. How many dimensions are used in the SURF scheme?

only 64 dimensions are used, reducing the time for feature computation and matching, and increasing simultaneously the robustness.

###### Q5. What is the advantage of using the determinant of the Hessian matrix rather than its?

Using the determinant of the Hessian matrix rather than its trace (the Laplacian) seems advantageous, as it fires less on elongated, ill-localised structures.

###### Q6. What did the authors use to arrive at the SURF descriptor?

In order to arrive at these SURF descriptors, the authors experimented with fewer and more wavelet features, using d2x and d 2 y, higher-order wavelets, PCA, median values, average values, etc.

###### Q7. What is the valuable property of an interest point detector?

The most valuable property of an interest point detector is its repeatability, i.e. whether it reliably finds the same interest points under different viewing conditions.

###### Q8. What is the first step for constructing the descriptor?

For the extraction of the descriptor, the first step consists of constructing a square region centered around the interest point, and oriented along the orientation selected in the previous section.

###### Q9. What is the widely used detector?

The most widely used detector probably is the Harris corner detector [10], proposed back in 1988, based on the eigenvalues of the second-moment matrix.

###### Q10. What is the underlying intensity structure of the sub-regions?

each sub-region has a four-dimensional descriptor vector v for its underlying intensity structure v = ( ∑ dx, ∑ dy, ∑ |dx|, ∑ |dy|).

###### Q11. What are some examples of interest point detectors?

Examples are the salient region detector proposed by Kadir and Brady [13], which maximises the entropy within the region, and the edge-based region detector proposed by Jurie et al. [14].

###### Q12. How many additions to the sum of the intensities?

With IΣ calculated, it only takes four additions to calculate the sum of the intensities over any upright, rectangular area, independent of its size.

###### Q13. What are the effects of the descriptor?

anisotropic scaling, and perspective effects are assumed to be second-order effects, that are covered to some degree by the overall robustness of the descriptor.

###### Q14. What is the Frobenius norm for the filter?

for example, their 27 × 27 filter corresponds to σ = 3 × 1.2 = 3.6 = s. Furthermore, as the Frobenius norm remains constant for their filters, they are already scale normalised [26].

###### Q15. What is the effect of fast matching?

This PCA-SIFT yields a 36- dimensional descriptor which is fast for matching, but proved to be less distinctive than SIFT in a second comparative study by Mikolajczyk et al. [8] and slower feature computation reduces the effect of fast matching.

###### Q16. Why is SURF better suited to the feature space?

Due to space limitations, only results on similarity threshold based matching are shown in Fig. 7, as this technique is better suited to represent the distribution of the descriptor in its feature space [8] and it is in more general use.

###### Q17. What is the way to compute the SURF descriptor?

the authors also propose an upright version of their descriptor (USURF) that is not invariant to image rotation and therefore faster to compute and better suited for applications where the camera remains more or less horizontal.

###### Q18. What is the appealing descriptor for practical uses?

The SIFT descriptor still seems to be the most appealing descriptor for practical uses, and hence also the most widely used nowadays.