LDAHash: Improved Matching with Smaller Descriptors

doi:10.1109/TPAMI.2011.103

LDAHash:

Improved Matching with Smaller Descriptors

Christoph Strecha, Alexander M. Bronstein, Member, IEEE,

Michael M. Bronstein, Member, IEEE, and Pascal Fua, Senior Member, IEEE

Abstract—SIFT-like local feature descriptors are ubiquitously employed in computer vision applications such as content-based

retrieval, video analysis, copy detection, object recognition, photo tourism, and 3D reconstruction. Feature descriptors can be designed

to be invariant to certain classes of photometric and geometric transformations, in particular, affine and intensity scale transformations.

However, real transformations that an image can undergo can only be approximately modeled in this way, and thus most descriptors

are only approximately invariant in practice. Second, descriptors are usually high dimensional (e.g., SIFT is represented as a

128-dimensional vector). In large-scale retrieval and matching problems, this can pose challenges in storing and retrieving descriptor

data. We map the descriptor vectors into the Hamming space in which the Hamming metric is used to compare the resulting

representations. This way, we reduce the size of the descriptors by representing them as short binary strings and learn descriptor

invariance from examples. We show extensive experimental validation, demonstrating the advantage of the proposed approach.

Index Terms—Local features, SIFT, DAISY, binarization, similarity-sensitive hashing, metric learning, 3D reconstruction, matching.

Ç

1INTRODUCTION

O

VER the last decade, feature point descriptors such as

SIFT [1] and similar methods [2], [3], [4] have become

indispensable tools in the computer vision community. They

are usually represented as high-dimensional vectors, such as

the 128-dimensional SIFT or the 64-dimensiona l SURF

vectors. While a descriptor’s high dimensionality is not an

issue when only a few hundred points need to be

represented, it becomes a significant concern when millions

have to be on a device with limited computational and

storage resources. This happens, for example, when storing

all descriptors for a large-scale urban scene on a mobile

phone for image-based location purposes. Not only does this

require a tremendous amount of storage, it is also slow and

potentially unreliable because most recognition algorithms

rely on nearest-neighbor computations and computing

euclidean distances between long vectors is neither cheap

nor optimal.

Consequently, there have been many recent attempts at

compacting SIFT-like descriptors to allow for faster match-

ing while retaining their outstanding recognition rates. One

class of techniques relies on quantization [5], [6] and

dimensionality reduction [7], [8]. While helpful, this

approach is usually not sufficient to produce truly short

descriptors without loss of matching performance. Another

class [9], [10], [11], [12] takes advantage of training data to

learn short binary codes whose distances are small for

positive training pairs and large for others. This is

particularly promising because not only does binarization

reduce the descriptor size, but also partly increases

performance, as will be shown.

Binarization is usually performed by multiplying the

descriptors by a projection matrix, subtracting a threshold

vector, and retaining only the sign of the result. This maps

the data into a space of binary strings, greatly reducing their

size on the one hand and simplifying their similarity

computation (now becoming the Hamming metric, which

can be computed very efficiently on modern CPUs) on the

other. Another class of locality-sensitive hashing (LSH)

techniques and their variants [9], [13] encode similarity of

data points as the collision probability of their binary codes.

While such similarity can be evaluated very efficiently,

these techniques usually require a large number of hashing

functions to be constructed in order to achieve competitive

performance. Also, families of LSH functions have been

constructed only for classes of standard metrics, such as the

L

p

norms, and do not allow for supervision.

In most supervised binarization techniques based on a

linear projection, the matrix entries and thresholds are

selected so as to preserve similarity relationships in a training

set. Doing this efficiently involves solving a difficult non-

linear optimization problem and most of the existing

methods offer no guarantee of finding a global optimum.

By contrast, spectral hashing (SH) [14] does offer this

guarantee for simple data distributions and has proved very

successful. However, this approach is only weakly super-

vised by imposing a euclidean metric on the input data,

which we will argue is not a particularly good one in our case.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. X, XXXXXXX 2011 1

. C. Strecha is with the EPFL/IC/ISIM/CVLab, Station 14, Lausanne CH-

1015, Switzerland. E-mail: christoph.strecha@epfl.ch.

. A.M. Bronstein is with the Department of Computer Science, Technion-

Israel Institute of Technology, Room 341, Taub Building, Haifa 32000,

Israel. E-mail: bron@cs.technion.ac.il.

. M.M. Bronstein is with the Institute of Computational Science, Faculty of

Informatics, Via Giuseppe Buffi 13, Lugano 6900.

E-mail: michael.bronstein@usi.ch.

. P. Fua is with the IC-CVLab, Station 14, EPFL, Lausanne CH-1015,

Switzerland. E-mail: pascal.fua@epfl.ch.

Manuscript received 27 Aug. 2010; revised 23 Jan. 2011; accepted 3 Mar.

2011; published online 13 May 2011.

Recommended for acceptance by F. Dellaert.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number

TPAMI-2010-08-0660.

Digital Object Identifier no. 10.1109/TPAMI.2011.103.

0162-8828/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society

To better take advantage of training data composed of

interest point descriptors corresponding to multiple 3D

points seen under different views, we introduce a global

optimization scheme that is inspired by an earlier local

optimization one [10]. In [10], the entries of the projection

matrix and thresholds vectors are constructed progressively

using AdaBoost. Given that Adaboost is a gradient-based

method [15] and that the algorithm optimizes a few matrix

rows at a time, there is no guarantee the solution it finds is

optimal. By contrast, we first compute a projection matrix

that is designed either to solely minimize the in-class

covariance of the descriptors or to jointly minimize the in-

class covariance and maximize the covariance across

classes, both of which can be achieved in closed form. This

being done, we compute optimal thresholds that turn the

projections into binary vectors so as to maximize recogni-

tion rates. In essence, we perform Linear Discriminant

Analysis (LDA) on the descriptors before binarization and

will therefore refer to our approach as LDAHash.

Our experiments show that state-of-the-art metric learn-

ing methods based, e.g., on margin maximization [16], [17]

achieve exceptional performance in the low false negative

rate range, which degrades significantly in the low false

positive rate range. Binarization usually only deteriorates

performance. In large-scale applications that involve match-

ing keypoints against databases containing millions of

them, achieving good performance in the low false positive

rate range is crucial to preventing a list of potential matches

from becoming unacceptably long. We use ROC curves to

show that, in many different cases, the proposed method

has competitive performance in the low false negative rage

while significantly outperforming other methods in the low

false positive range.

We also show that unlike many other techniques where

binarization produces performance degradation, using our

approach to binarize SIFT descriptors [1] actually improves

matching performance. This is especially true in the low

false positive range with 64 or 128-bit descriptors, which

means that they are about 10 to 20 times shorter than the

original ones. Furthermore, using competing approaches

[10], [14], [18] to produce descriptors of the same size as

ours results in lower matching performance over the full

false positive range.

In the following section, we briefly survey existing

approaches to binarization. In Section 3, we introduce our

own framework. In Section 4, we describe the correspond-

ing training methodology, training data, and analyze the

impact of individual components of our approach. Finally,

we present our results in Section 5.

2PRIOR WORK

Most approaches for compacting SIFT-like descriptors and

allowing for faster matching rely on one or more of the

following techniques:

2.1 Tuning

In [8], [19], [6], [20], [18], the authors use training to optimize

the filtering and normalization steps that produce a SIFT-like

vector. The same authors optimize in [18] over the position of

the elements that make up a DAISY descriptor [4].

2.2 Quantization

The SIFT descriptor can be quantized using, for instance,

only 4 bits per coordinate [5], [18], thus saving memory and

speeding up matching because comparing short vectors is

faster than comparing long ones. Chandrasekhar et al. [20]

applied tree-coding methods for lossy compression of

probability distributions to SIFT-like descriptors to obtain

a compressed histogram of gradients (CHOG).

2.3 Dimensionality reduction

PCA has been extensively used to reduce the dimensionality

of SIFT vectors [21], [6]. In this way, the number of bits

required to describe each dimension can be reduced without

loss in matching performance [6], [18]. In [22], a whitening

linear transform was proposed in addition to benefit from

the efficiency of fast nearest-neighbor search methods.

The three approaches above are mostly unsupervised

methods and sometimes require a complex optimization

scheme [20], [18]. Often, they are not specifically tuned for

keypoint matching and do not usually produce descriptors as

short as one would require for large-scale keypoint matching.

Our formulation relates to supervised metric learning

approaches. The problem of optimizing SIFT-like descrip-

tors can be approached from the perspective of metric

learning, where many efficient approaches have been

recently developed for learning similarity between data

from a training set of similar and dissimilar pairs [23], [24].

In particular, similarity-sensitive hashing (SSH) or locality-

sensitive hashing [9], [10], [14], [11], [12] algorithms seek to

find an efficient binary representation of high-dimensional

data maintaining their similarity in the new space. These

methods have also been applied to global image descrip-

tors and bag-of-feature representations in content-based

image search [25], [26], [27], [28], video copy detection [29],

and shape retrieval [30]. In [31] and [32], Hamming

embedding was used to replace vector quantization in

bag-of-feature construction.

There are a few appealing properties of similarity-

sensitive hashing methods in large-scale descriptor match-

ing applications. First, such methods combine the effects of

dimensionality reduction and binarization, which makes

the descriptors more compact and easier to store. Second,

the metric between the binarized descriptors is learned

from examples and renders more correctly their similarity.

In particular, it is possible to take advantage of feature

point redundancy and transitive closures in the training

set, such as those in Fig. 3. Finally, comparison of binary

descriptors is computationally very efficient and is amen-

able for efficient indexing.

Existing methods for similarity-sensitive hashing have a

few serious drawbacks in our application. The method of

Shakhnarovich [10] poses the similarity-sensitive hashing

problem as boosted classification and tries to find its solution

by means of a standard AdaBoost algorithm. However,

given that AdaBoost is a greedy algorithm equivalent to a

gradient-based method [15], there is no guarantee of global

optimality of the solution. The spectral hashing algorithm

[14], on the other hand, has a tacit underlying assumption of

euclidean descriptor similarity, which is typically far from

being correct. Moreover, it is worthwhile mentioning that

spectral hashing, similarity-sensitive hashing, and similar

2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. X, XXXXXXX 2011

methods have so far proven to be very efficient in retrieval

applications for ranking the matches, in which one typically

tries to achieve high recall. Thus, the operating point in these

applications is at low false negative rates, which ensures that

no relevant matches (typically, only a few) are missed. In

large-scale descriptor matching, on the other hand, one has

to create a list of likely candidate matches, which can be very

large if the false positive rate is high. For example, given a set

of 1 M descriptors, which is modest for Internet-scale

applications, and 1 percent false positive rate, 10 K

candidates would have to considered. Consequently, an

important concern in this application is a very low false

positive rate. As we show in the following, our approach is

especially successful at this operating point, while existing

algorithms show poor performance.

3APPROACH

Let us assume we are given a large set of keypoint

descriptors. They are grouped into subsets corresponding

to the same 3D points and all pairs within the subsets

are therefore considered as belonging to the same class. The

main idea of our method is to find a mapping from the

descriptor space to the Hamming space by means of an

affine map followed by a sign function such that the

Hamming distance between the binarized descriptors is as

close as possible to the similarity of the given data set. Our

method involves two key steps:

Projection selection. We compute a projection matrix that is

designed either to solely minimize the in-class covariance of

the descriptors or to jointly minimize the in-class covariance

and maximize the covariance across classes, both of which

can be done in closed form (Sections 3.3.1 and 3.3.2).

Threshold selection. We find thresholds that can be used to

binarize the projections so that the resulting binary strings

maximize recognition rates. We show that this threshold

selection is a separable problem that can be solved using 1D

search. In the remainder of this section, we formalize these

steps and describe them in more details.

3.1 Problem Formulation

Our set of keypoint descriptorsisrepresentedas

n-dimensional vectors in IR

n

. We attempt to find their

representation in some metric space ðZZ;d

ZZ

Þ by means of a

map of the form y :IR

n

!ðZZ;d

ZZ

Þ. The metric d

ZZ

ðy  yÞ

parameterizes the similarity between the feature descrip-

tors, which may be difficult to compute in the original

representation. Our goal in finding such a mapping is

twofold. First, ZZ should be an efficient representation. This

implies that yðxÞ requires significantly less storage than x,

and that d

ZZ

ðyðxÞ; yðx

0

ÞÞ is much easier to compute than,

e.g., kx  x

0

k. Second, the metric d

ZZ

ðy  yÞ should better

represent some ideal descriptor similarity, in the following

sense: Given a set P of pairs of descriptors from

corresponding points in different images, e.g., the same

object under a different view point (referred to as positives)

and a set N of pairs of descriptors from different points

(negatives), we would l ike d

ZZ

ðyðxÞ; yðx

0

ÞÞ <R for all

ðx; x

0

Þ2P and d

ZZ

ðyðxÞ; yðx

0

ÞÞ >R for all ðx; x

0

Þ2N to

hold with high probability for some range R.

Setting ZZ to be the m-dimensional Hamming space

IH

m

¼f1g

m

, the embedding of a descriptor x can be

expressed as an m-dimensional binary string. Here, we limit

our attention to affine embeddings of the form

y ¼ signðPx þ tÞ; ð1Þ

where P is an m  n matrix and t is an m  1 vector;

embeddings having more complicated forms can be

obtained in a relatively straightforward manner by introdu-

cing kernels. Even under the optimistic assumption that real

numbers can be quantized and represented by 8 bits, the

size of the original descriptor is 8n bits, while the size of the

binary representation is m bits. Thus, setting m  n allows

us to significantly alleviate the storage complexity and

potentially improve descriptor indexing.

Furthermore, the descriptor dissimilarity is computed in

our representation using the Hamming metric d

IH

m

ðy; y

0

Þ¼

m

2



1

2

P

m

i¼1

signðy

i

y

0

i

Þ, which is done by performing an XOR

operation between y and y

0

and counting the number of

nonzero bits in the result, an operation carried out in a single

instruction on modern CPU architectures (POPCNT SSE4.2).

The embedding y is constructed t o m inimize the

expectation of the Hamming metric on the set positive

pairs while maximizing it on the set of negative pairs. This

can be expressed as minimization of the loss function:

L ¼  E



d

IH

m

ðy; y

0

ÞjP



 E



d

IH

m

ðy; y

0

ÞjN



; ð2Þ

with respect to the projection parameters P and t. Here,  is

a parameter controlling the trade-off between false positive

and false negative rates (higher  corresponds to lower false

negative rates). In practice, the conditional expectations

EfjPg, EfjN g are replaced by averages on a training set of

positive and negative pairs of descriptors, respectively.

3.2 LDAHash

Here, we note that up to constants, problem (2) is equivalent

to the minimization of

L ¼ Efy

T

y

0

jN g   Efy

T

y

0

jPg ð3Þ

or

L ¼  Efky  y

0

k

2

jPg  Efky  y

0

k

2

jN g; ð4Þ

attempting to make the correlation of the binary codes as

negative as possible for negative pairs and as positive

as possible for positive pairs. Direct minimization of L is

difficult since the terms y involve a nondifferentiable sign

nonlinearity. While, in principle, smooth approximation is

possible, the solution of the resulting nonconvex problem in

ðm þ 1Þn variables is challenging, typically containing

thousands of unknowns.

As an alternative, we propose to relax the problem,

removing the sign and minimizing a related function:

~

L ¼ EfkPx  Px

0

k

2

jPg  EfkPx  Px

0

k

2

jN g: ð5 Þ

The above objective is independent of the affine term t and

optimization can be performed over the projection matrix P

only, which we further restrict to be orthogonal. Once the

optimal matrix is found, we can fix it and minimize a

smooth version of (4) with respect to t.

STRECHA ET AL.: LDAHASH: IMPROVED MATCHING WITH SMALLER DESCRIPTORS 3

3.3 Projection Selection

Next, we describe two different approaches for computing

P, which we refer to as LDA and Difference of Covariances

(DIF) and that we compare in Sections 4 and 5.

3.3.1 Linear Discriminant Analysis

We start by observing that

EfkPx  Px

0

k

2

jPg ¼ tr



P

P

T



;

where 

P

¼ Efðx  x

0

Þðx  x

0

Þ

T

jPg is the covariance matrix

of the positive descriptor vector differences. This leads to

~

L ¼  tr



P

P

T



 tr



P

N

P

T



;

with 

N

¼ Efðx  x

0

Þðx  x

0

Þ

T

jN g being the covariance

matrix of the negative descriptor vector differences.

Transforming the coordinates by premultiplying x by



1=2

N

turns the second term of

~

L into a constant for any

unitary P, leaving

~

L / tr



P

1=2

N



P



T=2

N

P

T



¼ tr



P

P



1

N

P

T



¼ tr



P

R

P

T



;

ð6Þ

where 

R

¼ 

P



1

N

is the ratio of the positive and negative

covariance matrices. Since 

R

is a symmetric positive

semidefinite matrix, it a dmits the eigendecomposition



R

¼ USU

T

, where S is a nonnegative diagonal matrix.

An orthogonal m  n matrix P minimizing the trace of

P

R

P

T

is a projection onto the space spanned by the

m smallest eigenvectors of 

R

,

~

L is given by

P

1=2

N

¼ð

R

Þ

1=2

m



1=2

N

¼

~

S

1=2

m

~

U

T



1=2

N

; ð7Þ

where

~

S is the m  m matrix with the smallest eigenvalues

and

~

U is the n  m matrix with the corresponding

eigenvectors (for notation brevity, we denote such a

projection by ð

R

Þ

1=2

m

). This approach resembles the spirit

of linear discriminant analysis. A similar technique has been

introduced in [29] within the framework of boosted

similarity learning. Note that the normalization of columns

of P is unimportant since a sign function is applied to its

output. However, we keep the normalization by the inverse

square root of the variances, which makes the projected

differences Pðx  x

0

Þ normal and white.

3.3.2 Difference of Covariances

An alternative approach can be derived by observing that

~

L ¼ tr



P

D

P

T



;

where 

D

¼ 

P

 

N

. This yields

P ¼ð

D

Þ

1=2

m

; ð8Þ

where at most m smallest negative eigenvectors are selected.

This selection of the projection matrix will be referred to as

covariance difference and denoted by DIF. Note that it allows

controlling the trade-off between false positive and negative

rates through the parameter , which is impossible in the

LDA approach.

The limit  !1 is of particular interest as it yields



D

/ 

P

. In this case, the negative covariance does not play

any role in the training, which is equivalent to assuming

that the differences of negative descriptor vectors are white

Gaussian, 

N

¼ I. The corresponding projection matrix is

given by

P ¼ð

P

Þ

1=2

m

: ð9Þ

The main advantage of this approach is that it allows

learning the projection in a semi-supervised setting when

only positive pairs are available.

In general, a fully supervised approach is advantageous

over its semi-supervised counterpart, which assumes a

sometimes unrealistic unit covariance of the negative class

differences. However, unlike the positive training set

containing only pairs of knowingly matching descriptors,

the negative set might be contaminated by positive pairs (a

situation usually referred to a s label noise). If such a

contamination is significant, the semi-supervised setting is

likely to perform better.

3.4 Threshold Selection

Given the projection matrix P selected as described in the

previous section, our next step is to minimize a smooth

version of the loss function (3),

L ¼ EfsignðPx þ tÞ

T

signðPx

0

þ tÞjN g

 EfsignðPx þ tÞ

T

signðPx

0

þ tÞjPg

¼

X

m

i¼1

Efsign



p

T

i

x þ t

i



sign



p

T

i

x

0

þ t

i



jN g

 Efsign



p

T

i

x þ t

i



sign



p

T

i

x

0

þ t

i



jPg;

ð10Þ

with respect to the thresholds t, where p

T

i

denotes the

ith row of P and t

i

denotes the ith element of t. Observe

that due to its separable form, the problem can be split into

independent subproblems:

min

t

i

E



sign



p

T

i

x þ t

i



p

T

i

x

0

þ t

i



jN



E sign



p

T

i

x þ t

i



p

T

i

x

0

þ t

i



jP



;

ð11Þ

which in turn can be solved using simple 1D search over

each threshold t

i

.

Let y ¼ p

T

i

x and y

0

¼ p

T

i

x

0

be the ith element of the

projected training vectors x and x

0

. The ith bits of y and y

0

coincide if t

i

< minfy; y

0

g or t

i

> maxfy; y

0

g, and differ if

minfy; y

0

gt

i

 maxfy; y

0

g. For a given value of the thresh-

old, we express the false negative rate as

FNðtÞ¼Prðminfy; y

0

gt or maxfy; y

0

g <tjPÞ

¼ 1  Prðminfy; y

0

g <tjPÞ þ Prðmaxfy; y

0

g <tjPÞ

¼ 1  cdfðminfy; y

0

gjPÞ þ cdfðmaxfy; y

0

gjPÞ

ð12Þ

with cdf standing for cumulative distribution function.

Similarly, the false positive rate can be expressed as

FPðtÞ¼Prðminfy; y

0

g <t maxfy; y

0

gjN Þ

¼ 1  Prðminfy; y

0

gt or maxfy; y

0

g <tjN Þ

¼ cdfðminfy; y

0

gjN Þ  cdfðmaxfy; y

0

gjN Þ:

ð13Þ

We compute histograms of minimal and maximal values of

projected positive and negative pairs, from which the

cumulative densities are estimated. The optimal threshold t

i

4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. X, XXXXXXX 2011

is selected to minimize FP þ FN (or, alternatively, maximize

TN þ TP, where TP ¼ 1  FN and TN ¼ 1  FP are the true

positive and true negative rates, respectively). Fig. 1

visualizes TP, TN, and TP  FP for the first two components

i ¼ 1; 2 of the projections LDA and DIF.

4TRAINING METHODOLOGY

In this section, we first describe our ground truth training

and evaluation data. We then evaluate different aspects of

our binary descriptors.

4.1 Ground Truth Data

To build our ground truth database, we used sets of

calibrated images for which we show the 3D point model

and a member image in Figs. 3, 4, 14, 15, and 16. These data

sets contain images we acquired ourself, such as those in

Figs. 14 and 15, and sometimes over extended periods of

time (Fig. 3). Those of Figs. 3, 4, and 15 contain images

downloaded from the Internet or are fully acquired from

this source, as in the case of Fig. 16.

We used our own calibration pipeline [33] to register

them and to compute internal and external camera

parameters as well as a sparse set of 3D points, each

corresponding to a single keypoint track. First, pairwise

keypoint correspondences are established using Vedaldi’s

[34] SIFT [1] descriptors that we compared using the

standard L

2

norm. These are transformed into keypoint

tracks which are used to grow initial reconstructions that

have been obtained by a robust fit of pairwise essential

matrices. This standard procedure is similar to [35] and we

refer to this and our work [33] for more information.

Because our data set contains multiple views of the same

scene, we have many conjunctive closure matches [36] such

as the one depicted by the blue line in Fig. 3 (bottom): A

keypoint that is matched in two other images, as depicted

by the green lines, gives rise to an additional match in these

other two images. Since they may be quite different from

each other, the L

2

distance between the corresponding

descriptors may be large. Yet, the descriptors in all three

images will be treated as belonging to the same class, which

is key to learning a metric that can achieve better matching

performance than the original L

2

norm. In our data sets,

these conjunctive closures partially build long chains for

which individual pairs can have quite large L

2

norm as one

can see in Fig. 2. In practice, we consider only chains with

five or more keypoints, i.e., 3D points that are visible in at

least five images.

For the negative examples, we randomly sampled the

same number of keypoint pairs and checked that none of

them belonged to the positive set.

This training database is more specific than the one used

in [8] and [19], where the authors use a calibrated database

of images and their dense multiview stereo correspon-

dences. However, calibration and dense stereo information

is used to extract the image patches which are centered

around 3D point projections and use these to build a

training database of positive matches. In our framework, we

use the calibration only to geometrically verify SIFT matches

as being consistent with the camera parameters and with

the 3D structure. The 2D position, scale, and orientation of

the original interest points is kept such that we can perform

STRECHA ET AL.: LDAHASH: IMPROVED MATCHING WITH SMALLER DESCRIPTORS 5

Fig. 1. The probability density functions for the classification perfor-

mance for positive and negative training examples (a) for the first two

dimensions and (b) for DIF.

Fig. 2. Some of the keypoints from the same 3D point for the Venice data set in Fig. 16 are shown as an example. The red circle shows the keypoint

(DoG) position and its scale. The track was extracted by consecutive SIFT L

2

matching, which makes it possible to include keypoint pairs

(conjunctive closures) that are quite different into the training and evaluation set.

LDAHash: Improved Matching with Smaller Descriptors

Figures

Citations

Unmanned aerial systems for photogrammetry and remote sensing: A review

Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval

Supervised hashing with kernels

Iterative quantization: A procrustean approach to learning binary codes

Supervised Discrete Hashing

References

Distinctive Image Features from Scale-Invariant Keypoints

Support-Vector Networks

SURF: speeded up robust features

Speeded-Up Robust Features (SURF)

A performance evaluation of local descriptors

Related Papers (5)

Spectral Hashing

Distinctive Image Features from Scale-Invariant Keypoints

Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval

Similarity Search in High Dimensions via Hashing

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope