scispace - formally typeset
Open AccessProceedings ArticleDOI

Query Adaptive Similarity for Large Scale Object Retrieval

Reads0
Chats0
TLDR
This paper presents a probabilistic framework for modeling the feature to feature similarity measure, and proposes a function to score the individual contributions into an image to image similarity within the probabilism framework.
Abstract
Many recent object retrieval systems rely on local features for describing an image. The similarity between a pair of images is measured by aggregating the similarity between their corresponding local features. In this paper we present a probabilistic framework for modeling the feature to feature similarity measure. We then derive a query adaptive distance which is appropriate for global similarity evaluation. Furthermore, we propose a function to score the individual contributions into an image to image similarity within the probabilistic framework. Experimental results show that our method improves the retrieval accuracy significantly and consistently. Moreover, our result compares favorably to the state-of-the-art.

read more

Content maybe subject to copyright    Report

Query Adaptive Similarity for Large Scale Object Retrieval
Danfeng Qin Christian Wengert Luc van Gool
ETH Z
¨
urich, Switzerland
{qind,wengert,vangool}@vision.ee.ethz.ch
Abstract
Many recent object retrieval systems rely on local fea-
tures for describing an image. The similarity between a
pair of images is measured by aggregating the similarity
between their corresponding local features. In this paper
we present a probabilistic framework for modeling the fea-
ture to feature similarity measure. We then derive a query
adaptive distance which is appropriate for global similar-
ity evaluation. Furthermore, we propose a function to score
the individual contributions into an image to image similar-
ity within the probabilistic framework. Experimental results
show that our method improves the retrieval accuracy sig-
nificantly and consistently. Moreover, our result compares
favorably to the state-of-the-art.
1. Introduction
We consider the problem of content-based image re-
trieval for applications such as object recognition or simi-
lar image retrieval. This problem has applications in web
image retrieval, location recognition, mobile visual search,
and tagging of photos.
Most of the recent state-of-the-art large scale image re-
trieval systems rely on local features, in particular the SIFT
descriptor [14] and its variants. Moreover, these descrip-
tors are typically used jointly with a bag-of-words (BOW)
approach, reducing considerably the computational burden
and memory requirements in large scale scenarios.
The similarity between two images is usually expressed
by aggregating the similarities between corresponding lo-
cal features. However, to the best of our knowledge, few
attempts have been made to systematically analyze how to
model the employed similarity measures.
In this paper we present a probabilistic view of the fea-
ture to feature similarity. We then derive a measure that is
adaptive to the query feature. We show - both on simulated
and real data - that the Euclidean distance density distribu-
tion is highly query dependent and that our model adapts
the original distance accordingly. While it is difficult to
know the distribution of true correspondences, it is actu-
ally quite easy to estimate the distribution of the distance of
non-corresponding features. The expected distance to the
non-corresponding features can be used to adapt the origi-
nal distance and can be efficiently estimated by introducing
a small set of random features as negative examples. Fur-
thermore, we derive a global similarity function that scores
the feature to feature similarities. Based on simulated data,
this function approximates the analytical result.
Moreover, in contrast to some existing methods, our
method does not require any parameter tuning to achieve its
best performance on different datasets. Despite its simplic-
ity, experimental results on standard benchmarks show that
our method improves the retrieval accuracy consistently and
significantly and compares favorably to the state-of-the-art.
Furthermore, all recently presented post-processing
steps can still be applied on top of our method and yield
an additional performance gain.
The rest of this paper is organized as follows. Section 2
gives an overview of related research. Section 3 describes
our method in more detail. The experiments for evaluating
our approach are described in Section 4. Results in a large
scale image retrieval system are presented in Section 5 and
compared with the state-of-the-art.
2. Related Work
Most of the recent works addressing the image similar-
ity problem in image retrieval can be roughly grouped into
three categories.
Feature-feature similarity The first group mainly works
on establishing local feature correspondence. The most fa-
mous work in this group is the bag-of-words (BOW) ap-
proach [24]. Two features are considered to be similar if
they are assigned to the same visual word. Despite the effi-
ciency of the BOW model, the hard visual word assignment
significantly reduces the discriminative power of the local
features. In order to reduce quantization artifacts, [20] pro-
posed to assign each feature to multiple visual words. In
contrast, [8] rely on using smaller codebooks but in con-
junction with short binary codes for each local feature, re-
fining the feature matching within the same Voronoi cell.
Additionally, product quantization [12] was used to esti-
2013 IEEE Conference on Computer Vision and Pattern Recognition
1063-6919/13 $26.00 © 2013 IEEE
DOI 10.1109/CVPR.2013.211
1608
2013 IEEE Conference on Computer Vision and Pattern Recognition
1063-6919/13 $26.00 © 2013 IEEE
DOI 10.1109/CVPR.2013.211
1608
2013 IEEE Conference on Computer Vision and Pattern Recognition
1063-6919/13 $26.00 © 2013 IEEE
DOI 10.1109/CVPR.2013.211
1610

mate the pairwise Euclidean distance between features, and
the top k nearest neighbors of a query feature is considered
as matches. Recently, several researchers have addressed
the problem of the Euclidean distance not being the optimal
similarity measure in most situations. For instance in [16],
a probabilistic relationship between visual words is learned
from a large collection of corresponding feature tracks. Al-
ternatively, in [21], they learn a projection from the original
feature space to a new space, such that Euclidean metric in
this new space can appropriately model feature similarity.
Intra-image similarity The second group focuses on effec-
tively weighting the similarity of a feature pair considering
its relationship to other matched pairs.
Several authors exploit the property that the local fea-
tures inside the same image are not independent. As a
consequence, a direct accumulation of local feature sim-
ilarities can lead to inferior performance. This problem
was addressed in [4] by down-weighting the contribution
of non-incidentally co-occurring features. In [9] this prob-
lem was approached by re-weighting features according to
their burstiness measurement.
As the BOW approach discards spatial information, a
scoring step can be introduced which exploits the property
that the true matched feature pairs should follow a consis-
tent spatial transformation. The authors of [19] proposed
to use RANSAC to estimate the homography between im-
ages, and only count the contribution of feature pairs con-
sistent with this model. [26] and [23] propose to quantize
the image transformation parameter space in a Hough vot-
ing manner, and let each matching feature pair vote for its
correspondent parameter cells. A feature pair is considered
valid if it supports the cell of maximum votes.
Inter-image similarity Finally, the third group addresses
the problem of how to improve the retrieval performance by
exploiting additional information contained in other images
in the database, that depict the same object as the query im-
age. [5] relies on query expansion. That is, after retrieving
a set of spatially verified database images, this new set is
used to query the system again to increase recall. In [22],
a set of relevant images is constructed using k-reciprocal
nearest neighbors, and the similarity score is evaluated on
how similar a database image is to this set.
Our work belongs to the first group. By formulating the
feature-feature matching problem in a probabilistic frame-
work, we propose an adaptive similarity to each query fea-
ture, and a similarity function to approximate the quanti-
tative result. Although the idea of adapting similarity by
dissimilarity has already been exploited in [11][17], we pro-
pose to measure dissimilarity by mean distance of the query
to a set of random features, while theirs use k nearest neigh-
bors (kNN). According to the fact that, in a realistic dataset,
different objects may have different numbers of relevant im-
ages, it is actually quite hard for the kNN based method to
find an generalized k for all queries. Moreover, as kNN is an
order statistic, it could be sensitive to outliers and can’t be
used reliably as an estimator in realistic scenarios. In con-
trast, in our work, the set of random features could be con-
sidered as a clean set of negative examples, and the mean
operator is actually quite robust as shown later.
Considering the large amount of data in a typical large
scale image retrieval system, it is impractical to compute
the pairwise distances between high-dimensional original
feature vectors. However, several approaches exist to re-
lieve that burden using efficient approximations such as
[12, 13, 3, 6]. For simplicity, we adopt the method proposed
in [12] to estimate the distance between features.
3. Our Approach
In this section, we present a theoretical framework for
modeling the visual similarity between a pair of features,
given a pairwise measurement. We then derive an analytical
model for computing the accuracy of the similarity estima-
tion in order to compare different similarity measures. Fol-
lowing the theoretical analysis, we continue the discussion
on simulated data. Since the distribution of the Euclidean
distance varies enormously from one query feature to an-
other, we propose to normalize the distance locally to ob-
tain similar degree of measurement across queries. Further-
more, using the adaptive measure, we quantitatively analyze
the similarity function on the simulated data and propose a
function to approximate the quantitative result. Finally, we
discuss how to integrate our findings into a retrieval system.
3.1. A probabilistic view of similarity estimation
We are interested in modeling the visual similarity be-
tween features based on a pairwise measurement.
Let us denote as x
i
the local feature vectors from a query
image and as Y = {y
1
, ..., y
j
, ..., y
n
} a set of local fea-
tures from a collection of database images. Furthermore,
let m(x
i
,y
j
) denote a pairwise measurement between x
i
and y
j
. Finally T (x
i
) represents the set of features which
are visually similar to x
i
, and F (x
i
) as the set of features
which are dissimilar to x
i
. Instead of considering whether
y
j
is similar to x
i
and how similar they look, we want to
evaluate how likely y
j
belongs to T (x
i
) given a measure
m. This can be modeled as follows
f(x
i
,y
j
)=p(y
j
T (x
i
) | m(x
i
,y
j
)) (1)
For simplicity, we denote m
j
= m(x
i
,y
j
), T
i
= T (x
i
),
and F
i
= F (x
i
).Asy
j
either belongs to T
i
or F
i
,wehave
p(y
j
T
i
| m
j
)+p(y
j
F
i
| m
j
)=1 (2)
Furthermore, according to the Bayes Theorem
p(y
j
T
i
| m
j
)=
p(m
j
| y
j
T
i
) × p(y
j
T
i
)
p(m
j
)
(3)
160916091611

and
p(y
j
F
i
| m
j
)=
p(m
j
| y
j
F
i
) × p(y
j
F
i
)
p(m
j
)
(4)
Finally, by combining Equations 2, 3 and 4 we get
p(y
j
T
i
| m
j
)=
1+
p(m
j
| y
j
F
i
)
p(m
j
| y
j
T
i
)
×
p(y
j
F
i
)
p(y
j
T
i
)
1
(5)
For large datasets the quantity p(y
j
T
i
) can be modeled
by the occurrence frequency of x
i
. Therefore, p(y
j
T
i
)
and p(y
j
F
i
) only depend on the query feature x
i
.
In contrast, p(m
j
| y
j
T
i
) and p(m
j
| y
j
F
i
) are
the probability density functions of the distribution of m
j
,
for {y
j
| y T
i
} and {y
j
| y F
i
}. We will show in
Section 3.3, how to generate simulated data for estimating
these distributions. In Section 3.5 we will further exploit
these distributions in our framework.
3.2. Estimation accuracy
Since the pairwise measurement between features is the
only observation for our model, it is essential to estimate
its reliability. Intuitively, an optimal measurement should
be able to perfectly separate the true correspondences from
the false ones. In other words, the better the measurement
distinguishes the true correspondences from the false ones,
the more accurately the feature similarity based on it can
be estimated. Therefore, the measurement accuracy can be
modeled as the expected pureness. Let T be a collection of
all matched pairs of features, i.e,
T = {(x, y) | y T (x))} (6)
The probability that a pair of features is a true match given
the measurement value z can be expressed as
p(T|z)=p((x, y) ∈T |m(x, y)=z) (7)
Furthermore, the probability of observing a measurement
value z given a corresponding feature pair is
p(z |T)=p(m(x, y)=z | (x, y) ∈T) (8)
Then, the accuracy for the similarity estimation is
Acc(m)=
−∞
p(T|z) × p(z |T)dz (9)
with m some pairwise measurement and Acc(m) the accu-
racy of the model based on m. Since
p(T|z) 1 and
−∞
p(z |T)dz =1 (10)
the accuracy of a measure m is
Acc(m) 1 (11)
and
Acc(m)=1 p(T|z)=1, p(z |T) > 0 (12)
This measure allows to compare the accuracy of different
distance measurements as will be shown in the next section.
3.3. Ground truth data generation
In order to model the property of T (x
i
), we simulate cor-
responding features using the following method: First, re-
gions r
i,0
are detected on a random set of images by the
Hessian Affine detector[15]. Then, we apply numerous ran-
dom affine warpings (using the affine model proposed by
ASIFT [25]) to r
i,0
, and generate a set of related regions.
Finally, SIFT features are computed on all regions resulting
in {x
i,1
,x
i,2
, ..., x
i,n
} as a subset of T (x
i,0
).
The parameters for the simulated affine transformation
are selected randomly and some random jitter is added to
model the detection errors occurring in a practical setting.
The non-corresponding features F (x
i
) are simply generated
by selecting 500K random patches extracted from a differ-
ent and unrelated dataset. In this way, we also generate a
dataset D containing 100K matched pairs of features from
different images, and 1M non-matched paris. Figure 1 de-
picts two corresponding image patches randomly selected
from the simulated data.
Figure 1. Corresponding image patches for two randomly selected
points of the simulated data
3.4. Query adaptive distance
It has been observed that the Euclidean distance is not
an appropriate measurement for similarity [21, 16, 11]. We
argue that the Euclidean distance is a robust estimator when
normalized locally.
As an example, Figure 2 depicts the distributions of
the Euclidean distance of the corresponding and non corre-
sponding features for the two different interest points shown
in Figure 1. For each sample point x
i
, we collected a set of
500 corresponding features T (x
i
) using the procedure from
Section 3.3 and a set of 500K random non-corresponding
features F (x
i
). It can be seen, that the Euclidean dis-
tance separates the matching from the non-matching fea-
tures quite well in the local neighborhood of a given query
feature x
i
.
However, by averaging the distributions of T (x
i
) and
F (x
i
) respectively for all queries x
i
, the Euclidean distance
loses its discriminative power. This explains, why the Eu-
clidean distance has inferior performance in estimating vi-
161016101612

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
T(x_1)
F(x_1)
T(x_2)
F(x_2)
Probability
Euclidean Distance
Figure 2. Distribution of the Euclidean distance for two points
from the simulated data. The solid lines show the distribution for
corresponding features T (x
i
), whereas the dotted line depict non-
corresponding ones F (x
i
).
sual similarity from a global point of view. A local adapta-
tion is therefore necessary to recover the discriminability of
the Euclidean Distance.
Another property can also be observed in Figure 2:ifa
feature has a large distance to its correspondences, it also
has a large distance to the non-matching features. By ex-
ploiting this property, a normalization of the distance can
be derived for each query feature
d
n
(x
i
,y
j
)=d(x
i
,y
j
)/N
d(x
i
)
(13)
where d
n
(·, ·) represents the normalized distance, d(·, ·)
represents the original Euclidean distance and N
d(x
i
)
rep-
resents the expected distance of x
i
to its non-matching fea-
tures. It is intractable to estimate the distance distribution
between all feature and their correspondences, but it is sim-
ple to estimate the expected distance to non-corresponding
features. Since the non-corresponding features are inde-
pendent from the query, a set of randomly sampled, thus
unrelated features can be used to represent the set of non-
correspondent features to each query. Moreover, if we as-
sume the distance distribution of the non-corresponding set
to follow a normal distribution N (μ, σ), then the estima-
tion error of its mean based on a subset follows another
normal distribution N (0/N), with N the size of the sub-
set. Therefore, N
d(x
i
)
can be estimated sufficiently well
and very efficiently from even a small set of random, i.e.
non-corresponding features.
The probability that an unknown feature matches to the
query one when observing their distance z can be modeled
as,
p(T|z)=
N
T
× p(z |T)
N
T
× p(z |T)+N
F
× p(z |F)
= {1+
N
F
N
T
×
p(z |F)
p(z |T)
}
1
(14)
with N
T
and N
F
the number of corresponding and non-
corresponding pairs respectively. In practical settings, N
F
is usually many orders of magnitude larger than N
T
. There-
fore, once p(z |F) starts getting bigger than 0, p( T|z)
rapidly decreases, and the corresponding features would be
quickly get confused with the non-corresponding ones.
Figure 3 illustrates how the adaptive distance recovers
more correct matches compared to the Euclidean distance.
Moreover, by assuming that N
F
/N
T
1000 the
measurement accuracy following Equation 9 can be com-
puted. For the Euclidean distance, the estimation accuracy
is 0.7291, and for the adaptive distance, the accuracy is
0.7748. Our proposed distance thus significantly outper-
forms the Euclidean distance.
3.5. Similarity function
In this section, we show how to derive a globally appro-
priate feature similarity in a quantitative manner. After hav-
ing established the distance distribution of the query adap-
tive distance in the previous section, the only unknown in
Equation 5 remains
p(y
j
F
i
)
p(y
j
T
i
)
.
As discussed in Section 3.1, this quantity is inversely
proportional to the occurrence frequency of x
i
, and it is
generally a very large term. Assuming c =
p(y
j
F
i
)
p(y
j
T
i
)
be-
ing between 10 and 100000, the full similarity function can
be estimated and is depicted in Figure 4.
The resulting curves follow an inverse sigmoid form
such that the similarity is 1 for d
n
0 and 0 if d
n
1.
They all have roughly the same shape and differ approxi-
mately only by an offset. It is to be noted, that they show
a very sharp transition making it very difficult to correctly
estimate the transition point and thus to achieve a good sep-
aration between true and false matches.
In order to reduce the estimation error due to such sharp
transitions, a smoother curve would be desirable. Since the
distance distributions are all long-tailed, we have fitted dif-
ferent kinds of exponential functions to those curves. How-
ever, we observe similar results. For the reason of simplic-
ity, we choose to approximate the similarity function as
f(x
i
,y
j
)=exp(α × d
n
(x
i
,y
j
)
4
) (15)
As can be seen in Figure 4, this curve is flatter and covers
approximately the full range of possible values for c.
In Equation 15, α can be used to tune the shape of the
final function and roughly steers the slope of our function,
we achieved best results with α =9and keep this value
throughout all experiments.
In the next section, the robustness of this function in real
image retrieval system will be evaluated.
3.6. Overall method
In this section we will integrate the query adaptive dis-
tance measurement and the similarity function presented
161116111613

0 0.5 1 1.5 2
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Euclidean Distance
Empirical Probability
true correpondent pairs
false correpondent pairs
(a) Euclidean Distance
0 0.5 1 1.5 2
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Query Adaptive Distance
Empirical Probability
true correpondent pairs
false correpondent pairs
(b) Query Adaptive Distance
     1   










Empirical Probability
Euclidean Distance
Query Adaptive Distance
(c) Comparison
Figure 3. The comparison of our adaptive distance to the Euclidean distance on dataset D. The solid lines are the distance distribution of the
matched pairs, and the dotted lines are the distance distribution of non-matched pairs. The green dashed lines denotes where the probability
of the non-matching distance exceed 0.1%, i.e, the non-matching feature is very likely to dominate our observation. A comparison of the
right tails of both distributions is shown in (c).
0 0.5 1 1.5 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Our adaptive distance
Similarity
c=10
c=100
c=1000
c=10000
c=100000
Our function
Figure 4. Feature similarity evaluated on dataset D. Red lines are
the visual similarity for different c evaluated on the simulated data.
The blue line is our final similarity function with α =9.
before into an image retrieval system.
Let the visual similarity between the query image q =
{x
1
, ..., x
m
} and a database image d = {y
1
, ..., y
n
} be
sim(q, d)=
m
i=1
n
j=1
f(x
i
,y
j
) (16)
with f ( x
i
,y
j
) the pairwise feature similarity as in Equa-
tion 15. As mentioned before, d
n
(x
i
,y
j
) and N
d(x
i
)
are
estimated using the random set of features.
For retrieval, we use a standard bag-of-words inverted
file. However, in order to have an estimation of the pairwise
distance d(x
i
,y
j
) between query and database features, we
add a product quantization scheme as in [12] and select the
same parameters as the original author. The feature space
is firstly partitioned into N
c
=20
000 Voronoi cells ac-
cording to a coarse quantization codebook K
c
. All features
located in the same Voronoi cell are grouped into the same
inverted list. Each feature is further quantized with respect
to its coarse quantization centroid. That is, the residual be-
tween the feature and its closest centroid is equally split into
m =8parts and each part is separately quantized according
to a product quantization codebook K
p
with N
p
= 256 cen-
troids. Then, each feature is encoded using its related image
identifier and a set of quantization codes, and is stored in its
corresponding inverted list.
We select random features from Flickr and add 100 of
them to each inverted list. For performance reasons, we
make sure that the random features are added to the inverted
list before adding the database vectors.
At query time, all inverted lists whose related coarse
quantization centers are in the k nearest neighborhood of
the query vector are scanned.
With our indexing scheme, the distances to non-
matching features are always computed first, with their
mean value being directly N
d(x
i
)
. Then, the query adap-
tive distance d
n
(x
i
,y
j
) to each database vector can directly
be computed as in Equation 13. In order to reduce un-
necessary computation even more, a threshold β is used
to quickly drop features whose Euclidean distance is larger
than β × N
d(x
i
)
. This parameter has little influence on the
retrieval performance, but reduces the computational load
significantly. Its influence is evaluated in Section 4.
As pointed out by [9], local features of an image tend to
occur in bursts. In order to avoid multiple counting of statis-
tically correlated features, we incorporate both “intra bursti-
ness” and “inter burstiness” normalization [9] to re-weight
the contributions of every pair of features. The similarity
function thus changes to
sim(q, d)=
m
i=1
n
j=1
w(x
i
,y
j
)f(x
i
,y
j
) (17)
with w(x
i
,y
j
) the burstiness weighting.
161216121614

Citations
More filters
Proceedings ArticleDOI

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

TL;DR: A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task and an efficient training procedure which can be applied on very large-scale weakly labelled tasks are developed.
Proceedings ArticleDOI

Image retrieval using scene graphs

TL;DR: A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features.
Proceedings ArticleDOI

Siamese Instance Search for Tracking

TL;DR: It turns out that the learned matching function is so powerful that a simple tracker built upon it, coined Siamese INstance search Tracker, SINT, suffices to reach state-of-the-art performance.
Journal ArticleDOI

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

TL;DR: A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task, and significantly outperforms non-learnt image representations and off-the-shelf CNN descriptors on two challenging place recognition benchmarks.
Journal ArticleDOI

SIFT Meets CNN: A Decade Survey of Instance Retrieval

TL;DR: A comprehensive survey of instance retrieval over the last decade, presenting milestones in modern instance retrieval, reviews a broad selection of previous works in different categories, and provides insights on the connection between SIFT and CNN-based methods.
References
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI

Video Google: a text retrieval approach to object matching in videos

TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.
Journal ArticleDOI

Scale & Affine Invariant Interest Point Detectors

TL;DR: A comparative evaluation of different detectors is presented and it is shown that the proposed approach for detecting interest points invariant to scale and affine transformations provides better results than existing methods.
Proceedings ArticleDOI

Object retrieval with large vocabularies and fast spatial matching

TL;DR: To improve query performance, this work adds an efficient spatial verification stage to re-rank the results returned from the bag-of-words model and shows that this consistently improves search quality, though by less of a margin when the visual vocabulary is large.
Journal ArticleDOI

Product Quantization for Nearest Neighbor Search

TL;DR: This paper introduces a product quantization-based approach for approximate nearest neighbor search to decompose the space into a Cartesian product of low-dimensional subspaces and to quantize each subspace separately.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What contributions have the authors mentioned in the paper "Query adaptive similarity for large scale object retrieval" ?

In this paper the authors present a probabilistic framework for modeling the feature to feature similarity measure. Furthermore, the authors propose a function to score the individual contributions into an image to image similarity within the probabilistic framework. 

For the experiment on Oxford5k, the authors find out that without the feature scaling, mAP will drop from 0.739 to 0.707, while without burstiness weighting, mAP will drop to 0.692. 

The expected distance to the non-corresponding features can be used to adapt the original distance and can be efficiently estimated by introducing a small set of random features as negative examples. 

Since the distribution of the Euclidean distance varies enormously from one query feature to another, the authors propose to normalize the distance locally to obtain similar degree of measurement across queries. 

In order to compare the adaptive distance function to the Euclidean distance, the authors use a threshold for separating matching and non-matching features. 

Since the non-corresponding features are independent from the query, a set of randomly sampled, thus unrelated features can be used to represent the set of noncorrespondent features to each query. 

in order to have an estimation of the pairwise distance d(xi, yj) between query and database features, the authors add a product quantization scheme as in [12] and select the same parameters as the original author. 

The authors show - both on simulated and real data - that the Euclidean distance density distribution is highly query dependent and that their model adapts the original distance accordingly.