Book Chapter•DOI•

BRIEF: binary robust independent elementary features

Q: What are the contributions mentioned in the paper "Brief: binary robust independent elementary features⋆" ?

The authors propose to use binary strings as an efficient feature point descriptor, which they call BRIEF. The authors show that it is highly discriminative even when using relatively few bits and can be computed using simple intensity difference tests. The authors compare it against SURF and U-SURF on standard benchmarks and show that it yields a similar or better recognition performance, while running in a fraction of the time required by either. Furthermore, the descriptor similarity can be evaluated using the Hamming distance, which is very efficient to compute, instead of the L2 norm as is usually done.

Q: What future works have the authors mentioned in the paper "Brief: binary robust independent elementary features⋆" ?

In future work, the authors will incorporate orientation and scale invariance into BRIEF so that it can compete with SURF and SIFT in a wider set of situations.

Q: What is the cost of achieving orientation invariance?

In other words, on data sets such as those that involve only modest amounts of in-plane rotation, there is a cost not only in terms of speed but also of recognition rate to achieving orientation invariance, as already pointed out in [4].

Q: What is the way to shorten a descriptor?

Another way to binarize the GIST descriptor is to use nonlinear Neighborhood Component Analysis [12, 13], which seems more powerful but probably slower at run-time.

Q: How many options are there for selecting the nd test locations?

Generating a length nd bit vector leaves many options for selecting the nd test locations (xi,yi) of Eq. 1 in a patch of size S × S.

Michael Calonder¹, Vincent Lepetit¹, Christoph Strecha¹, Pascal Fua¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

05 Sep 2010-Vol. 6314, pp 778-792

TL;DR: This work proposes to use binary strings as an efficient feature point descriptor, which is called BRIEF, and shows that it is highly discriminative even when using relatively few bits and can be computed using simple intensity difference tests.

read less

Abstract: We propose to use binary strings as an efficient feature point descriptor, which we call BRIEF. We show that it is highly discriminative even when using relatively few bits and can be computed using simple intensity difference tests. Furthermore, the descriptor similarity can be evaluated using the Hamming distance, which is very efficient to compute, instead of the L2 norm as is usually done. As a result, BRIEF is very fast both to build and to match. We compare it against SURF and U-SURF on standard benchmarks and show that it yields a similar or better recognition performance, while running in a fraction of the time required by either.

...read moreread less

Summary (3 min read)

Jump to: [1 Introduction] – [2 Related Work] – [3 Method] – [3.1 Smoothing Kernels] – [3.2 Spatial Arrangement of the Binary Tests] – [3.3 Distance Distributions] – [4 Results] and [5 Conclusion]

1 Introduction

Feature point descriptors are now at the core of many Computer Vision technologies, such as object recognition, 3D reconstruction, image retrieval, and camera localization.
These strings represent binary descriptors whose similarity can be measured by the Hamming distance.
⋆This work has been supported in part by the Swiss National Science Foundation.
Furthermore, comparing strings can be done by computing the Hamming distance, which can be done extremely fast on modern CPUs that often provide a specific instruction to perform a XOR or bit count operation, as is the case in the latest SSE [10] instruction set.
This means that BRIEF easily outperforms other fast descriptors such as SURF and U-SURF in terms of speed, as will be shown in the Results section.

3 Method

The authors approach is inspired by earlier work [9, 15] that showed that image patches could be effectively classified on the basis of a relatively small number of pairwise intensity comparisons.
The results of these tests were used to train either randomized classification trees [15] or a Naive Bayesian classifier [9] to recognize patches seen from different viewpoints.
When creating such descriptors, the only choices that have to be made are those of the kernels used to smooth the patches before intensity differencing and the spatial arrangement of the (x,y)-pairs.
In short, for both images of a pair and for a given number of corresponding keypoints between them, it quantifies how often the correct match can be established using BRIEF for description and the Hamming distance as the metric for matching.
This rate can be computed reliably because the scene is planar and the homography between images is known.

3.1 Smoothing Kernels

By construction, the tests of Eq. 1 take only the information at single pixels into account and are therefore very noise-sensitive.
It is for the same reason that images need to be smoothed before they can be meaningfully differentiated when looking for edges.
This analogy applies because their intensity difference tests can be thought of as evaluating the sign of the derivatives within a patch.
The more difficult the matching, the more important smoothing becomes to achieving good performance.
For the corresponding discrete kernel window the authors found a size of 9×9 pixels be necessary and sufficient.

3.2 Spatial Arrangement of the Binary Tests

The authors experimented with the five sampling geometries depicted by Fig.
The (xi,yi) locations are evenly distributed over the patch and tests can lie close to the patch border.
The first location xi is sampled from a Gaussian centered around the origin while the second location is sampled from another Gaussian centered on xi.
Test locations outside the patch are clamped to the edge of the patch.
For this reason, in all further experiments presented in this paper, it is the one the authors will use.

3.3 Distance Distributions

The authors take a closer look at the distribution of Hamming distances between their descriptors.
To this end the authors extract about 4000 matching points from the five image pairs of the Wall sequence.
The maximum possible Hamming distance being 32 · 8 = 256 bits, unsurprisingly, the distribution of distances for non-matching points is roughly Gaussian and centered around 128.
Since establishing a match can be understood as classifying pairs of points as being a match or not, a classifier that relies on these Hamming distances will work best when their distributions are most separated.
As the authors will see in section 4, this is of course what happens with recognition rates being higher in the first pairs of the Wall sequence than in the subsequent ones.

4 Results

The authors compare their method against several competing approaches.
For evaluation purposes, the authors rely on two straightforward metrics, elapsed CPU time and recognition rate.
Since the authors apply the same procedure to all descriptors, and not only ours, the relative rankings they obtain are still valid and speak in BRIEF’s favor.
This explains in part why both BRIEF and U-SURF outperform SURF.
OBRIEF-32 is not meant to represent a practical approach but to demonstrate that the response to in-plane rotations is more a function of the quality of the orientation estimator rather than of the descriptor itself, as evidenced by the fact that O-BRIEF-32 and SURF are almost perfectly superposed.

5 Conclusion

Not only is construction and matching for this descriptor much faster than for other state-of-the-art ones, it also tends to yield higher recognition rates, as long as invariance to large in-plane rotations is not a requirement.
The BRIEF code being very simple, the authors will be happy to make it publicly available.
It is also important from a more theoretical viewpoint because it confirms the validity of the recent trend [18, 12] that involves moving from the Euclidean to the Hamming distance for matching purposes.
Using fast orientation estimators, there is no theoretical reason why this could not be done without any significant speed penalty.

Did you find this useful? Give us your feedback

Figures (9)

Fig. 9. Left: Using CenSurE keypoints instead of SURF keypoints. BRIEF works slightly better with CenSurE than with SURF keypoints. Right: Recognition rate when matching the first image of the Wall dataset against a rotated version of itself, as a function of the rotation angle.

Fig. 3. Recognition rate for the five different test geometries introduced in section 3.2.

Fig. 6. Recognition rates on (a) Wall (b) Fountain. (c) Graffiti (d) Trees (e) Jpg (f) Light. The trailing 16, 32, or 64 in the descriptor’s name is its length in bytes. It is much shorter than those of SURF and U-SURF, which both are 256. For completeness, we also compare to a recent approach called Compact Signatures [7] which has been shown to be very efficient. We obtained the code from OpenCV’s SVN repository.

Fig. 4. Distributions of Hamming distances for matching pairs of points (thin blue lines) and for non-matching pairs (thick red lines) in each of the five image pairs of the Wall dataset. They are most separated for the first image pairs, whose baseline is smaller, ultimately resulting in higher recognition rates.

Fig. 5. Data sets used for comparison purposes. Each one contains 6 images and we consider 5 image pairs by matching the first one against all others.

Fig. 8. Recognition rate as a function of the number of tests on Wall. The vertical and horizontal lines denote the number of tests required to achieve the same recognition rate as U-SURF on respective image pairs. In other words, BRIEF requires only 58, 118, 184, 214, and 164 bits for Wall 1|2, ..., 1|6, respectively, which compares favorably to U-SURF’s 64 · 4 · 8 = 2048 bits (assuming 4 bytes/float).

Fig. 7. Influence of the number N of keypoints being considered on the recognition rates. For each of the five image pairs of Wall, we plot on the left side four sets of rates for N being 512, 1024, 2048, and 4096 when using U-SURF, and four equivalent sets when using BRIEF-32. The first are denoted as US and the second as B in the color chart. Note that the recognition rates consistently decrease when N increases but that, for the same N , BRIEF-32 outperforms U-SURF, except in the last image pair where the recognition rate is equally low for both.

Fig. 1. Each group of 10 bars represents the recognition rates in one specific stereo pair for increasing levels of Gaussian smoothing. Especially for the hard-to-match pairs, which are those on the right side of the plot, smoothing is essential in slowing down the rate at which the recognition rate decreases.

Fig. 2. Different approaches to choosing the test locations. All except the righmost one are selected by random sampling. Showing 128 tests in every image.

Content maybe subject to copyright Report

BRIEF: Binary Robust Independent

Elementary Features

⋆

Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua

CVLab, EPFL, Lausanne, Switzerland

e-mail: firstname.lastname@epfl.ch

Abstract. We propose to use binary strings as an eﬃcient feature point

descriptor, which we call BRIEF. We show that it is highly discriminative

even when using relatively few bits and can be computed using simple

intensity diﬀerence tests. Furthermore, the descr iptor similarity can be

evaluated using the Hamming distance, which is very eﬃcient to com-

pute, instead of the L

norm as is usually done.

As a result, BRIEF is very fast both to build and to match. We compare

it against SURF and U-SURF on standard benchmarks and show that

it yields a similar or better recognition performance, while running in a

fraction of the time required by either.

1 Introduction

Feature point descriptors are now at the core of many Computer Vision technolo-

gies, such as object recognition, 3D reconstruction, image retrieval, and camera

localization. Since applications of these technologies have to handle ever more

data or to run on mobile devices with limited computational resources, there is

a growing need for local descriptors that are fast to compute, fast to match, and

memory eﬃcient.

One way to speed up matching and reduce memory consumption is to work

with short descriptors. They can be obtained by applying dimensionality reduc-

tion, such as PCA [1] or LDA [2], to an original descriptor such as SIFT [3] or

SURF [4]. For example, it was shown in [5–7] that ﬂoating point values of the

descriptor vector could be quantized using very few bits per value without loss of

recognition performance. An even more drastic dimensionality reduction can be

achieved by using hash functions that reduce SIFT descriptors to binary strings,

as done in [8]. These strings represent binary descriptors whose similarity can

be measured by the Hamming distance.

While eﬀective, these approaches to dimensionality reduction require ﬁrst

computing the full descriptor before further processing can take place. In this

paper, we show that this whole computation can be shortcut by directly com-

puting binary strings from image patches. The individual bits are obtained by

comparing the intensities of pairs of points along the same lines as in [9] but with-

out requiring a training phase. We refer to the resulting descriptor as BRIEF.

⋆

This work has been supported in part by the Swiss National Science Foun dation.

2 Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua

Our experiments show that only 256 bits, or even 128 bits, often suﬃce

to obtain very good matching results. BRIEF is therefore very eﬃcient both to

compute and to store in memory. Furthermore, comparing strings can be done by

computing the Hamming distance, which can be done extremely fast on modern

CPUs that often provide a speciﬁc instruction to perform a XOR or bit count

operation, as is the case in the latest SSE [10] instruction set.

This means that BRIEF easily outperforms other fast descriptors such as

SURF and U-SURF in terms of speed, as will be shown in the Results section.

Furthermore, it also outperforms them in terms of recognition rate in many

cases, as we will demonstrate using benchmark datasets.

2 Related Work

The SIFT descriptor [3] is highly discriminant but, being a 128-vector, is rela-

tively slow to compute and match. This can be a drawback for real-time appli-

cations such as SLAM that keep track of many points as well as for algorithms

that require storing very large numbers of descriptors, for example for large-scale

3D reconstruction.

There are many approaches to solving this problem by developing faster to

compute and match descriptors, while preserving the discriminative power of

SIFT. The SURF descriptor [4] represents one of the best known ones. Like

SIFT, it relies on local gradient histograms but uses integral images to speed up

the computation. Diﬀerent parameter settings are possible but, since using only

64 dimensions already yields good recognition performances, that version has

become very popular and a de facto standard. This is why we compare ourselves

to it in the Results section.

SURF addresses the issue of speed but, since the descriptor is a 64-vector

of ﬂoating p oints values, representing it still requires 256 bytes. This becomes

signiﬁcant when millions of descriptors must be stored. There are three main

classes of approaches to reducing this number.

The ﬁrst involves dimensionality reduction techniques such as Principal Com-

ponent Analysis (PCA) or Linear Discriminant Embedding (LDE). PCA is very

easy to perform and can reduce descriptor size at no loss in recognition p er-

formance [1]. By contrast, LDE requires labeled training data, in the form of

descriptors that should be matched together, which is more diﬃcult to obtain.

It can improve performance [2] but can also overﬁt and degrade performance.

A second way to shorten a descriptor is to quantize its ﬂoating-point coordi-

nates into integers coded on fewer bits. In [5], it is shown that the SIFT descriptor

can be quantized using only 4 bits per coordinate. Quantization is used for the

same purpose in [6, 7]. It is a simple operation that results not only in memory

gain but also in faster matching as computing the distance between short vectors

can then be done very eﬃciently on modern CPUs. In [6], it is shown that for

some parameter settings of the DAISY descriptor, PCA and quantization can be

combined to reduce its size to 60 bits. However, in this approach the Hamming

Lecture Notes in Computer Science: BRIEF 3

distance cannot be used for matching because the bits are, in contrast to BRIEF,

arranged in blocks of four and hence cannot be processed independently.

A third and more radical way to shorten a descriptor is to binarize it. For

example, [8] drew its inspiration from Locality Sensitive Hashing (LSH) [11] to

turn ﬂoating-point vectors into binary strings. This is done by thresholding the

vectors after multiplication with an appropriate matrix. Similarity between de-

scriptors is then measured by the Hamming distance between the corresponding

binary strings. This is very fast because the Hamming distance can be computed

very eﬃciently with a bitwise XOR operation followed by a bit count. The same

algorithm was applied to the GIST descriptor to obtain a binary description of

an entire image [12]. Another way to binarize the GIST descriptor is to use non-

linear Neighborhood Component Analysis [12, 13], which seems more powerful

but probably slower at run-time.

While all three classes of shortening techniques provide satisfactory results,

relying on them remains ineﬃcient in the sense that ﬁrst computing a long

descriptor then shortening it involves a substantial amount of time-consuming

computation. By contrast, the approach we advocate in this paper directly builds

short descriptors by comparing the intensities of pairs of points without ever

creating a long one. Such intensity comparisons were used in [9] for classiﬁcation

purposes and were shown to be very powerful in spite of their extreme simplicity.

Nevertheless, the present approach is very diﬀerent from [9] and [14] because it

does not involve any form of online or oﬄine training.

3 Method

Our approach is inspired by earlier work [9, 15] that showed that image patches

could be eﬀectively classiﬁed on the basis of a relatively small number of pair-

wise intensity comparisons. The results of these tests were used to train either

randomized classiﬁcation trees [15] or a Naive Bayesian classiﬁer [9] to recognize

patches seen fr om diﬀerent viewpoints. Here, we do away with both the classiﬁer

and the trees, and simply create a bit vector out of the test responses, which we

compute after having smoothed the image patch.

More speciﬁcally, we deﬁne test τ on patch p of size S × S as

τ(p; x, y) :=



1 if p(x) < p(y)

0 otherwise

, (1)

where p(x) is the pixel intensity in a smoothed version of p at x = (u, v)

⊤

Choosing a set of n

(x, y)-location pairs uniquely deﬁnes a set of binary tests.

We take our BRIEF descriptor to be the n

-dimensional bitstring

(p) :=

1≤i≤n

i−1

τ(p; x

, y

) . (2)

In this pap er we consider n

= 128, 256, and 512 and will show in the Results

section that these yield good compromises between speed, storage eﬃciency,

4 Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua

and recognition rate. In the remainder of the paper, we will refer to BRIEF

descriptors as BRIEF-k, where k = n

/8 represents the number of bytes required

to store the descriptor.

When creating such descriptors, the only choices that have to be made are

those of the kernels used to smooth the patches before intensity diﬀerencing and

the spatial arrangement of the (x, y)-pairs. We discuss these in the r emainder of

this section.

To this end, we use the Wall dataset that we will describe in more detail in

section 4. It contains ﬁve image pairs, with the ﬁrst image being the same in all

pairs and the second image shot from a monotonically growing baseline, which

makes matching increasingly more diﬃcult. To compare the pertinence of the

various potential choices, we use as a quality measure the recognition rate in

image pairs that will be precisely deﬁned at the beginning of section 4. In short,

for both images of a pair and for a given number of corresponding keypoints be-

tween them, it quantiﬁes how often the correct match can be established using

BRIEF for description and the Hamming distance as the metric for matching.

This rate can be computed reliably because the scene is planar and the homog-

raphy between images is known. It can therefore be used to check whether points

truly correspond to each other or not.

3.1 Smoothing Kernels

By construction, the tests of Eq. 1 take only the information at single pixels into

account and are therefore very noise-sensitive. By pre-smoothing the patch, this

sensitivity can be reduced, thus increasing the stability and repeatability of the

descriptors. It is for the same reason that images need to be smoothed before

they can be meaningfully diﬀerentiated when looking for edges. This analogy

applies because our intensity diﬀerence tests can be thought of as evaluating the

sign of the derivatives within a patch.

Fig. 1 illustrates the eﬀects of increasing amounts of G aussian smoothing on

the recognition rates for variances of Gaussian kernel ranging from 0 to 3. The

more diﬃcult the matching, the more important smoothing becomes to achieving

good performance. Furthermore, the recognition rates remain relatively constant

in the 1 to 3 range and, in practice, we use a value of 2. For the corresponding

discrete kernel window we found a size of 9×9 pixels be necessary and suﬃcient.

3.2 Spatial Arrangement of the Binary Tests

Generating a length n

bit vector leaves many options for selecting the n

test

locations (x

, y

) of Eq. 1 in a patch of size S × S. We experimented with the

ﬁve sampling geometries depicted by Fig. 2. Assuming the origin of the patch

coordinate system to be located at the patch center, they can be described as

follows.

I) (X, Y) ∼ i.i.d. Uniform(−

): The (x

, y

) locations are evenly distributed

over the patch and tests can lie close to the patch border.

Lecture Notes in Computer Science: BRIEF 5

100

Recognition rate

no smo...

σ=0.65

σ=0.95

σ=1.25

σ=1.55

σ=1.85

σ=2.15

σ=2.45

σ=2.75

σ=3.05

Fig. 1. Each group of 10 bars represents the recognition rates in one speciﬁc stereo pair

for increasing levels of Gaussian smoothing. Especially for the hard-to-match pairs,

which are those on the right s ide of the plot, smoothing is essential in slowing down

the rate at which the recognition rate decreases.

Fig. 2. Diﬀerent approaches to choosing the test locations. All except the righmost one

are selected by random sampling. Showing 128 tests in every image.

II) (X, Y) ∼ i.i.d. Gaussian(0,

): The tests are sampled from an isotropic

Gaussian distribution. Experimentally we found

σ ⇔ σ

give best results in terms of recognition rate.

III) X ∼ i.i.d. G aussian(0,

) , Y ∼ i.i.d. Gaussian(x

100

) : The sampling

involves two steps. The ﬁrst location x

is sampled from a Gaussian centered

around the origin while the second location is sampled from another Gaussian

centered on x

. This forces the tests to be more local. Test locations outside

the patch are clamped to the edge of the patch. Again, exp erimentally we

found

σ ⇔ σ

100

for the second Gaussian performing b est.

IV) The (x

, y

) are randomly sampled from discrete locations of a coarse polar

grid introducing a spatial quantization.

HTML Viewer

Frequently Asked Questions (14)

Q1. What are the contributions mentioned in the paper "Brief: binary robust independent elementary features⋆" ?

The authors propose to use binary strings as an efficient feature point descriptor, which they call BRIEF. The authors show that it is highly discriminative even when using relatively few bits and can be computed using simple intensity difference tests. The authors compare it against SURF and U-SURF on standard benchmarks and show that it yields a similar or better recognition performance, while running in a fraction of the time required by either. Furthermore, the descriptor similarity can be evaluated using the Hamming distance, which is very efficient to compute, instead of the L2 norm as is usually done.

Q2. What future works have the authors mentioned in the paper "Brief: binary robust independent elementary features⋆" ?

In future work, the authors will incorporate orientation and scale invariance into BRIEF so that it can compete with SURF and SIFT in a wider set of situations.

Q3. How many bytes does SURF require to match?

SURF addresses the issue of speed but, since the descriptor is a 64-vector of floating points values, representing it still requires 256 bytes.

Q4. What is the way to determine a match?

Since establishing a match can be understood as classifying pairs of points as being a match or not, a classifier that relies on these Hamming distances will work best when their distributions are most separated.

Q5. How many bits of distance is the maximum possible distance for non-matching points?

The maximum possible Hamming distance being 32 · 8 = 256 bits, unsurprisingly, the distribution of distances for non-matching points is roughly Gaussian and centered around 128.

Q6. What is the cost of achieving orientation invariance?

In other words, on data sets such as those that involve only modest amounts of in-plane rotation, there is a cost not only in terms of speed but also of recognition rate to achieving orientation invariance, as already pointed out in [4].

Q7. What is the way to shorten a descriptor?

Another way to binarize the GIST descriptor is to use nonlinear Neighborhood Component Analysis [12, 13], which seems more powerful but probably slower at run-time.

Q8. What is the way to create a short descriptor?

When creating such descriptors, the only choices that have to be made are those of the kernels used to smooth the patches before intensity differencing and the spatial arrangement of the (x,y)-pairs.

Q9. How many options are there for selecting the nd test locations?

Generating a length nd bit vector leaves many options for selecting the nd test locations (xi,yi) of Eq. 1 in a patch of size S × S.

Q10. What are the main features of the descriptors?

Feature point descriptors are now at the core of many Computer Vision technologies, such as object recognition, 3D reconstruction, image retrieval, and camera localization.

Q11. What is the popular descriptor in the world?

Chief among them is the latest OpenCV implementation of the SURF descriptor [4], which has become a de facto standard for fast-to-compute descriptors.

Q12. How did the authors determine the SURFpoints in each image?

To confirm this, the authors detected SURFpoints in both images of each test pair and computed their (SURF- or BRIEF-) descriptors, matched these descriptors to their nearest neighbor, and applied a standard left-right consistency check.

Q13. How do you get the individual bits?

The individual bits are obtained by comparing the intensities of pairs of points along the same lines as in [9] but without requiring a training phase.

Q14. How can the authors reduce the size of a descriptor?

In [6], it is shown that for some parameter settings of the DAISY descriptor, PCA and quantization can be combined to reduce its size to 60 bits.

BRIEF: binary robust independent elementary features

Summary (3 min read)

1 Introduction

3 Method

3.1 Smoothing Kernels

3.2 Spatial Arrangement of the Binary Tests

3.3 Distance Distributions

4 Results

5 Conclusion

Figures (9)

Citations

Cites background or methods from "BRIEF: binary robust independent el..."

Cites background or methods from "BRIEF: binary robust independent el..."

Cites background or methods from "BRIEF: binary robust independent el..."

Cites background or result from "BRIEF: binary robust independent el..."

References

Related Papers (5)

Frequently Asked Questions (14)

Q1. What are the contributions mentioned in the paper "Brief: binary robust independent elementary features⋆" ?

Q2. What future works have the authors mentioned in the paper "Brief: binary robust independent elementary features⋆" ?

Q3. How many bytes does SURF require to match?

Q4. What is the way to determine a match?

Q5. How many bits of distance is the maximum possible distance for non-matching points?

Q6. What is the cost of achieving orientation invariance?

Q7. What is the way to shorten a descriptor?

Q8. What is the way to create a short descriptor?

Q9. How many options are there for selecting the nd test locations?

Q10. What are the main features of the descriptors?

Q11. What is the popular descriptor in the world?

Q12. How did the authors determine the SURFpoints in each image?

Q13. How do you get the individual bits?

Q14. How can the authors reduce the size of a descriptor?