A Comparison of Affine Region Detectors
Summary (5 min read)
1 Introduction
- Detecting regions covariant with a class of transformations has now reached some maturity in the computer vision literature.
- In particular, consider images from two viewpoints and the geometric transformation between the images induced by the viewpoint change.
- The confusion probably arises from the fact that, even though the regions themselves are covariant, the normalized image pattern they cover and the feature descriptors derived from them are typically invariant.
2 Affine covariant detectors
- In this section the authors give a brief description of the six region detectors used in the comparison.
- Sections 2.2 and 2.3 describe methods for detecting edge-based regions and intensity extrema-based regions.
- The idea is to select the characteristic scale of a local structure, for which a given function attains an extremum over scales .
- The eigenvalues of the second moment matrix are used to measure the affine shape of the point neighbourhood.
- The authors can therefore use this technique to estimate the shape of initial regions provided by the Harris and Hessian based detector.
2.2 An edge-based region detector
- The rationale behind this is that edges are typically rather stable features, that can be detected over a range of viewpoints, scales and/or illumination changes.
- Since intersections of two straight edges occur quite often, the authors cannot simply neglect this case.
- To circumvent this problem, the two photometric quantities given in Equation 4 are combined and locations where both functions reach a minimum value are taken to fix the parameters s1 and s2 along the straight edges.
- Moreover, instead of relying on the correct detection of the Harris corner point, the authors can simply use the straight lines intersection point instead.
- For easy comparison in the context of this paper, the parallelograms representing the invariant regions are replaced by the enclosed ellipses, as shown in figure 4(b).
2.3 Intensity extrema-based region detector
- Here the authors describe a method to detect affine covariant regions that starts from intensity extrema (detected at multiple scales), and explores the image around them in a radial way, delineating regions of arbitrary shape, which are then replaced by ellipses.
- The point for which this function reaches an extremum is invariant under affine geometric and linear photometric transformations (given the ray).
- The function fI(t) is in itself already invariant.
- This ellipse-fitting is again an affine covariant construction.
- Examples of detected regions are displayed in figure 4(a).
2.4 Maximally Stable Extremal region detector
- The word ‘extremal’ refers to the property that all pixels inside the MSER have either higher (bright extremal regions) or lower (dark extremal regions) intensity than all the pixels on its outer boundary.
- The ‘maximally stable’ in MSER describes the property optimized in the threshold selection process.
- This ensures that common photometric changes modelled locally as linear or affine leave E unaffected, even if the camera is non-linear (gamma-corrected).
- After sorting, pixels are marked in the image (either in decreasing or increasing order) and the list of growing and merging connected components and their areas is maintained using the union-find algorithm [38].
- Among the extremal regions, the ‘maximally stable’ ones are those corresponding to thresholds were the relative area change as a function of relative change of threshold is at a local minimum.
2.5 Salient region detector
- This detector is based on the pdf of intensity values computed over an elliptical region.
- Detection proceeds in two steps: first, at each pixel the entropy of the pdf is evaluated over the three parameter family of ellipses centred on that pixel.
- The set of entropy extrema over scale and the corresponding ellipse parameters are recorded.
- Second, the candidate salient regions over the entire image are ranked using the magnitude of the derivative of the pdf with respect to scale.
- More details about this method can be found in [12].
3 The image data set
- Figure 9 shows examples from the image sets used to evaluate the detectors.
- In the cases of viewpoint change, scale change and blur, the same change in imaging conditions is applied to two different scene types.
- This means that the effect of changing the image conditions can be separated from the effect of changing the scene type.
- The JPEG sequence is generated using a standard xv image browser with the image quality parameter varying from 40% to 2%.
- The composition of these two homographies (approximate and residual) gives an accurate homography between the reference and other image.
3.1 Discussion
- Before the authors compare the performance of the different detectors in more detail in the next section, a few more general observations can already be made, simply by examining the output of the different detectors for the images shown in figures 3 and 4.
- For the intensity extrema-based region detector, the algorithm finding intensity extrema is O(n), where n is again the number of pixels.
- The computation times mentioned in this table have all been measured on a Pentium 4 2GHz Linux PC, for the leftmost image shown in figure 9(a), which is 800× 640 pixels.
- Also, as will be shown in the next section , large regions automatically have better chances of overlapping other regions.
- Here, the authors focus on the original distinguished regions (except for the ellipse fitting for edge-based and MSER regions, to obtain the same shape for all detectors), as they determine the intrinsic quality of a detector.
4 Overlap comparison using homographies
- Two important parameters characterize the performance of a region detector: 1. the repeatability, i.e., the average number of corresponding regions detected in images under different geometric and photometric transformations, both in absolute and relative terms (i.e., percentage-wise), and 2. the accuracy of localization and region estimation.
- Clearly as the scaling goes to zero there is no intersection of the cones, and as the scaling goes to infinity the relative amount of overlap, defined as the ratio of the intersection to the union of the ellipses approaches unity.
- Moreover, it is not straightforward for all detectors to come up with a single parameter that can be varied to obtain the desired number of regions in a meaningful way, i.e., representing some kind of ‘quality measure’ for the regions.
- To give an idea of the number of regions, both absolute and relative repeatability scores are given.
4.1 Repeatability measure
- Two regions are deemed to correspond if the overlap error, defined as the error in the image area covered by the regions, is sufficiently small: 1− Rµa ∩R(HT µbH) (Rµa ∪RHT µbH) < O where Rµ represents the elliptic region defined by xT µx = 1.
- The union of the regions is Rµa∪R(HT µbH), and Rµa∩R(HT µbH) is their intersection.
- Then, the authors apply this scale factor to both the region in the reference image and the region detected in the other image which has been mapped onto the reference image, before computing the actual overlap error as described above.
- The precise procedure is given in the Matlab code on http://www.robots.ox.ac.uk/~vgg/research/affine.
- Note that an overlap error of 20% is very small as it corresponds to only 10% difference between the regions’ radius.
4.2 Repeatability under various transformations
- In a first set of experiments, the authors fix the overlap error threshold to 40% and the normalized region size to a radius of 30 pixels, and check the repeatability of the different region detectors for gradually increasing transformations, according to the image sets shown in figure 9.
- This can be understood by the fact that in most cases larger transformations result in lower quality images and/or smaller commonly visible parts between the reference image and the other image, and hence a smaller number of regions are detected.
- Figure 13(a) shows the repeatability score and figure 13(b) the absolute number of correspondences.
- The Hessian-Affine detector performs best, followed by MSER and Harris-Affine detectors.
- The number of corresponding regions detected on structured scene is much lower than for the textured scene and it changes by a different factor for different detectors.
4.3 More detailed tests
- To further validate their experimental setup and to obtain a deeper insight in what is actually going on, a more detailed analysis is performed on one image pair with a viewpoint change of 40 degrees, namely the first and third column of the graffiti sequence shown in figure 9(a).
- Choosing a lower threshold results in more accurate regions, .
- Figure 21(b) shows how the repeatability scores vary as a function of the normalized region size, with the overlap error threshold fixed to 40%.
- This results in a plot showing the repeatability scores for different detectors as a function of region size.
- The results for Hessian-Affine, Harris-Affine and IBR are similar.
5 Matching experiments
- In the previous section, the performance of the different region detectors is evaluated from a rather theoretical point of view, focusing on the overlap error and repeatability.
- To this end, the authors compute a descriptor for the regions, and then check to what extent matching with the descriptor gives the correct region match.
- This descriptor gave the best matching results in an evaluation of different descriptors computed on scale and affine invariant regions [25, 28].
- To this end, each elliptical region is first mapped to a circular region of 30× 30 pixels, and rotated based on the dominant gradient orientation, to compensate for the affine geometric deformations, as shown in figure 2(e).
- Note that unlike in section 4, this mapping concerns descriptors; the region size is coincidentally the same (30 pixels).
5.1 Matching score
- Again the measure is computed between a reference image and the other images in a set.
- The matching score is computed in two steps.
- Only a single match is allowed for each region.
- If the matching results do not follow those of the repeatability test for a particular feature type that means that the distinctiveness of these features differs from the distinctiveness of other detectors.
- Indeed, rather than taking the original distinguished region, one might also rescale the region first, which typically leads to more discriminative power – certainly for the small regions.
5.2 Matching under various transformations
- Figures 13 - 20 (c) and (d) give the results of the matching experiment for the different types of transformations.
- These are basically the same plots as given in figures 13 - 20 (a) and (b) but now focusing on regions that have actually been matched, rather than just corresponding regions.
- These detectors find several slightly different regions containing the same local structure all of which have a small overlap error.
- The same change in ranking for Harris-Affine and Hessian-Affine can be observed on the results for other transformations.
6 Conclusions
- In this paper the authors have presented the state of the art on affine covariant region detectors and have compared their performance.
- This also holds for IBR since both methods are designed for similar region types.
- Hessian-Affine and Harris-Affine provide more regions than the other detectors, which is useful in matching scenes with occlusion and clutter.
- Several detectors should be used simultaneously to obtain the best performance.
- Naturally, regions are also detected at depth and surface orientation discontinuities of 3D scenes.
Did you find this useful? Give us your feedback
Citations
13,011 citations
Cites background or methods from "A Comparison of Affine Region Detec..."
...For the detectors, we use the repeatability score, as described in [9]....
[...]
...Also, detailed comparisons and evaluations on benchmarking datasets have been performed [7, 8, 9]....
[...]
12,449 citations
Cites background from "A Comparison of Affine Region Detec..."
...[31]), although this will have an impact on the computation time....
[...]
...Also, detailed comparisons and evaluations on benchmarking datasets have been performed [28,30,31]....
[...]
7,057 citations
4,146 citations
Cites methods from "A Comparison of Affine Region Detec..."
...14: Affine normalization using the second moment matrices, as described in (Mikolajczyk et al. 2005)....
[...]
...In the area of feature detectors (Mikolajczyk et al. 2005), in addition to such classic approaches as Förstner-Harris (Förstner 1986, Harris and Stephens 1988) and difference of Gaussians (Lindeberg 1993, Lindeberg 1998b, Lowe 2004), maximially stable extremal regions (MSERs) are widely used for applications that require affine invariance (Matas et al....
[...]
4,024 citations
Cites methods from "A Comparison of Affine Region Detec..."
...In the current implementation of the proposed scheme, feature extraction on a 640 × 480 video frame takes around 0.2 seconds and the database query takes 25ms on a database with 50000 images....
[...]
References
46,906 citations
28,073 citations
"A Comparison of Affine Region Detec..." refers methods in this paper
...In the following the representing ellipse is chosen to have the same first and second moments as the originally detected region, which is an affine covariant construction method....
[...]
...In practice, we start from a Harris corner point p (Harris and Stephens, 1988) and a nearby edge, extracted with the Canny edge detector (Canny, 1986)....
[...]
16,989 citations
"A Comparison of Affine Region Detec..." refers background or methods in this paper
...The regions are similar to those detected by a Laplacian operator (trace) (Lindeberg, 1998; Lowe, 1999) but a function based on the determinant of the Hessian matrix penalizes very long structures for which the second derivative in one particular orientation is very small....
[...]
...…2002), image retrieval from large databases (Schmid and Mohr, 1997; Tuytelaars and Van Gool, 1999), model based recognition (Ferrari et al., 2004; Lowe, 1999; Obdržálek and Matas, 2002; Rothganger et al., 2003), object retrieval in video (Sivic and Zisserman, 2003; Sivic et al., 2004), visual…...
[...]
...Here we use the SIFT descriptor of Lowe (1999)....
[...]
...Following (Mikolajczyk and Schmid, 2003, 2005), we use the SIFT descriptor developed by Lowe (1999), which is an 128-dimensional vector, to describe the intensity pattern within the image regions....
[...]
...…we have not included methods for detecting regions which are covariant only to similarity transformations (i.e., in particular scale), such as (Lowe, 1999, 2004; Mikolajczyk and Schmid, 2001; Mikolajczyk et al., 2003), or other methods of computing affine invariant descriptors, such as image…...
[...]
15,558 citations
14,282 citations
"A Comparison of Affine Region Detec..." refers methods in this paper
...Second, a standard small-baseline robust homography estimation algorithm is used to compute an accurate residual homography between the reference and warped image (using hundreds of automatically detected and matched interest points) (Hartley and Zisserman, 2004)....
[...]
...Second, a standard small-baseline robust homography estimation algorithm is used to compute an accurate residual homography between the reference and warped image (using hundreds of automatically detected and matched interest points) [11]....
[...]
Related Papers (5)
Frequently Asked Questions (11)
Q2. What is the detector for a particular type of scene?
If only a very small number of matches is needed (e.g. for computing epipolar geometry), the MSER or IBR detector is the best choice for this type of scene.
Q3. How can an affinity be used to model image distortions arising from viewpoint changes?
an affinity is sufficient to locally model image distortions arising from viewpoint changes, provided that (1) the scene surface can be locally approximated by a plane or in case of a rotating camera, and (2) perspective effects are ignored, which are typically small on a local scale anyway.
Q4. How can the authors determine the affine shape of an image patch?
Note that rotation preserves the eigenvalue ratio for an image patch, therefore, the affine deformation can be determined up to a rotation factor.
Q5. What are the reasons for the lack of 100% performance of a detector?
The reasons for this lack of 100% performance are sometimes specific to detectors and scene types (discussed below), and sometimes general – the transformation is outside the range for which the detector is designed, e.g. discretization errors, noise, non-linear illumination changes, projective deformations etc.
Q6. What is the effect of region density on the repeatability score of a detector?
Also the region density, i.e., the number of detected regions per fixed amount of pixel area, may have an effect on the repeatability score of a detector.
Q7. How many regions are there for the textured blur scene?
The number of regions also strongly depends on the scene type, e.g. for the MSER detector there are about 2600 regions for the textured blur scene (figure 9(f)) and only 230 for the light change scene (figure 9(h)).
Q8. What is the complexity of the automatic scale selection and shape adaptation algorithm?
The complexity of the automatic scale selection and shape adaptation algorithm is O((m + k)p), where p is the number of initial points, m is a number of investigated scales in the automatic scale selection and k is a number of iterations in the shape adaptation algorithm.
Q9. What is the function fI(t) evaluated along the ray?
The following function is evaluated along each ray:fI(t) = abs(I(t)− I0) max ( R t0 abs(I(t)−I0)dt t , d ) with t an arbitrary parameter along the ray, I(t) the intensity at position t, I0 the intensity value at the extremum and d a small number which has been added to prevent a division by zero.
Q10. What is the percentage of correct matches?
As the threshold, therefore the number of matches increases (figure 22(a)), the number of correct as well as false matches also increases, but the number of false matches increases faster, hence the percentage of correct matches drops.
Q11. What is the basic measure of accuracy and repeatability of a region detector?
The basic measure of accuracy and repeatability the authors use is the relative amount of overlap between the detected region in the reference image and the region detected in the other image, projected onto the reference image using the homography relating the images.