What future works have the authors mentioned in the paper "Graph-based discriminative learning for location recognition" ?

Finding a way to automatically adjust learning parameters or synthesize the results from different clusters is an important issue, and an interesting direction of future work.

What is the way to use the simple fall back strategy?

In their experiments, the authors use the simple fall back strategy by default, and separately evaluate a combination of averaging and interleaving as a stronger form of tf-idf regularization.

What is the way to regularize the ranking of query images?

as a simple strategy, for query images where all models give a probability score below a minimum threshold Pmin (0.1 in their tests), the authors fall back to tf-idf scores, as the authors found low probability scores unreliable for ranking.

what is the conditional probability of a picture matching?

The last line in the derivation above relates P ′b to Pb via an update factor, (1− PbaPb Pa)/(1−Pa), that depends on Pa (the probability that the top ranked image matches) and Pba (a conditional probability).

what is the conditional probability that image b matches the query?

Pb − P (Xb = 1|Xa = 1)P (Xa = 1)1− Pa= Pb − PbaPa1− Pa = Pb ( 1− PbaPb Pa 1− Pa ) (1)where Pba = P (Xb = 1|Xa = 1) denotes the conditional probability that image b matches the query given that image a matches.

What is the way to improve the quality of the matching?

These matches are sufficient for their method (though to improve the quality of the matching, the authors can also run structure from motion to obtain a point cloud and a refined set of image correspondences).

what is the probability of image b matching the query?

The update factor in Eq. (1) has an intuitive interpretation: if image b is very similar to image a according to the graph (i.e., Pba is large), then its probability score is downweighted (because if a is an incorrect match, then b is also likely incorrect).

Why do the authors believe that the global ranking is contaminated by the low-degree nodes?

The authors believe this is due to the nature of image graphs for unstructured collections, where some nodes have many neighbors, and others (e.g. very zoomed-in images) have only a few; training and calibration for these low-degree nodes may result in models that overfit the data and contaminate the global ranking.

What is the way to determine the pose of the query image?

If the authors have a 3D structure from motion model, the authors can then associate 3D points with matchesin the query image, and determine its pose [18].

How do the authors use this information to verify the location of a query image?

The authors make use of this structural information in a bag-ofwords-based location recognition framework, in which the authors take a query image, retrieve similar images in the database, and perform detailed matching to verify each retrieved image1until a match is found.

(Open Access) Graph-Based Discriminative Learning for Location Recognition (2013) | Song Cao

Q: What are the contributions in "Graph-based discriminative learning for location recognition" ?

The authors explore new ways for exploiting the structure of a database by representing it as a graph, and show how the rich information embedded in a graph can improve a bagof-words-based location recognition method. In particular, starting from a graph on a set of images based on visual connectivity, the authors propose a method for selecting a set of subgraphs and learning a local distance function for each using discriminative techniques. In addition, the authors propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. The authors demonstrate that their methods improve performance over standard bag-of-words methods on several existing location recognition datasets.

Q: What is the main contribution of the approach?

A main contribution of their approach is to combine the power of discriminative learning methods with the rich structural information in an image graph, in order to learn a better database representation and to better analyze results at query time.

Graph-Based Discriminative Learning for Location Recognition

Song Cao Noah Snavely

Cornell University

Abstract

Recognizing the location of a query image by matching

it to a database is an important problem in computer vision,

and one for which the representation of the database is a key

issue. We explore new ways for exploiting the structure of a

database by representing it as a graph, and show how the

rich information embedded in a graph can improve a bag-

of-words-based location recognition method. In particular,

starting from a graph on a set of images based on visual

connectivity, we propose a method for selecting a set of sub-

graphs and learning a local distance function for each using

discriminative techniques. For a query image, each database

image is ranked according to these local distance functions

in order to place the image in the right part of the graph. In

addition, we propose a probabilistic method for increasing

the diversity of these ranked database images, again based

on the structure of the image graph. We demonstrate that our

methods improve performance over standard bag-of-words

methods on several existing location recognition datasets.

1. Introduction

Location recognition—determining where an image was

taken—is an important problem in computer vision. How-

ever, there is no single deﬁnition for what it means to be

a location, and, accordingly, a wide variety of representa-

tions for places have been used in research: Are places, for

instance, a set of distinct landmarks, each represented by a

set of images? [

] Are places latitude and longitude

coordinates, represented with a set of geotagged images?

[

] Should places be represented with 3D geometry, from

which we can estimate an explicit camera pose for a query

image? [

] This question of representation has ana-

logues in more general object recognition problems, where

many approaches regard objects as belonging to pre-deﬁned

categories (cars, planes, bottles, etc.), but other work repre-

sents objects more implicitly as structural relations between

images, encoded as a graph (as in the Visual Memex [19]).

Inspired by this latter work, our paper addresses the loca-

tion recognition problem by representing places as graphs

encoding relations between images, and explores how this

A (Center)

Query

B (Center)

C (Center)

Figure 1.

A segment of an example image matching graph with

three clusters deﬁned by representative images A, B and C.

Nodes in this graph are images, and edge connect overlapping

images. In order to match a new query image to the graph, our

method learns local distance functions for a set of neighborhoods

that cover the graph, for instance, the neighborhoods centered at

nodes A, B, and C, circled with colored boundaries. Given a query

image, we match to the graph using these learned neighborhood

models, rather than considering database images individually. Each

neighborhood has its own distinctive features, and our goal is to

learn and use them to aid recognition.

representation can aid in recognition. In our case, graphs

represent visual overlap between images—nodes correspond

to images, and edges to overlapping, geometrically consis-

tent image pairs—leveraging recent work on automatically

building image graphs (and 3D models) from large-scale

image collections [1, 7, 4, 3]. An example image graph for

photos of the town of Dubrovnik is shown in Figure 1. Given

an image graph, our goal is to take a query image and plug it

in to the graph in the right place, in effect recognizing its lo-

cation. The idea is that the structure inherent in these graphs

encodes much richer information than the set of database

images alone, and that utilizing this structural information

can result in better recognition methods.

We make use of this structural information in a bag-of-

words-based location recognition framework, in which we

take a query image, retrieve similar images in the database,

and perform detailed matching to verify each retrieved image

until a match is found. While others have used image graphs

in various settings before (especially in 3D reconstruction),

our main contribution is to introduce two new ways to ex-

ploit the graph’s structure in recognition. First, we

build

local models

of what it means to be similar to each neighbor-

hood of the graph (Figure 1). To do so, we use the graph’s

structure to deﬁne sets of images that are similar, and sets

that are different, and use discriminative learning techniques

to compute local distance functions tuned to speciﬁc parts

of the graph. Second, we use the connectivity of the graph

encourage diversity

in the set of results, using a proba-

bilistic algorithm to retrieve a shortlist of similar images that

are more likely to have at least one match. We show that

our graph-based approach results in improvements over bag-

of-words retrieval methods, and yields performance that is

close to more expensive direct feature matching techniques

on existing location recognition datasets.

2. Related Work

Image retrieval and supervised learning.

As with other

location recognition approaches [

], our work

uses an image-retrieval-based framework using a bag-of-

words model for a database of images. However, our goal

is not retrieval per se (i.e., to retrieve all related instances

of a query image), but instead recognition, where we aim

to determine where an image was taken (for which a single

correctly retrieved database image can be sufﬁcient).

Our work uses supervised learning to improve on such

methods. Prior work has also used various forms of su-

pervision to improve bag-of-words-style methods for both

retrieval and recognition. One type of supervision is based

on geolocation; images that are physically close—on the

same street, say—should also be closer in terms of their

image distance than images across the city or the globe.

Geolocation cues have been used to reweight different vi-

sual words based on their geographic frequency [

], or

to ﬁnd patches that discriminate different cities [

]. Other

methods rely on image matching to identify good features,

as we do. Turcot and Lowe [

] perform feature matching

on database images to ﬁnd reliable features. Arandjelovic

and Zisserman propose discriminative query expansion in

which a per-query-image distance metric is learned based

on feedback from image retrieval [

]. Mikulik et al. use

image matches to compute global correlations between vi-

sual words [

]. In contrast, we use discriminative learning

to learn a set of local distance metrics for the database as a

pre-process (rather than at query time), leveraging the known

graph structure of the database images.

Representing places.

Places in computer vision are often

represented as sets of images (e.g., the Eiffel Tower can be

represented with a collection of photos [

]). However, many

other representations of places have been explored. Some

methods use iconic images to represent sets of images taken

from very similar viewpoints [

]. Other approaches use

3D point clouds, derived from structure from motion, as a

richer geometric representation of a place [

]. Closer

to our approach are methods that explicitly construct and

exploit image graphs. For instance, Torii et al. download

Google Streetview images to form a recognition database,

and leverage the underlying Street View image network; in

their approach, they take linear combinations of neighboring

images (in bag-of-words space) to more accurately recognize

the continuum of possible viewpoints [

]. Li et al. use a vis-

ibility graph connecting images and 3D points in a structure-

from-motion model to reason about point co-occurrence for

location recognition [

]. A main contribution of our ap-

proach is to combine the power of discriminative learning

methods with the rich structural information in an image

graph, in order to learn a better database representation and

to better analyze results at query time.

3. Graph-based Location Recognition

We base our algorithm on a standard bag-of-words frame-

work [

], with images represented as

normalized his-

tograms of visual words, using a large vocabulary trained

from SIFT descriptors. Our problem takes as input a database

of images

represented as bag-of-words vectors, and an im-

age graph

, with a node for each image

a ∈ I

, and edges

(a, b)

connecting overlapping, geometrically consistent im-

age pairs. Our goal is to take a new query image and predict

which part of the graph this image is connected to, then use

this information to recognize its location.

To achieve this goal, we use the query to retrieve a short-

list of similar database images, and perform detailed match-

ing and geometric veriﬁcation on the top few matches. Be-

cause our goal is recognition, rather than retrieval, we want

to have at least one correct match appear as close as possible

to the top of the shortlist (rather than retrieve all similar im-

ages). Towards that end, our method improves on the often

noisy raw bag-of-words similarity measure by leveraging

the graph in two ways: (1) we discriminatively learn local

distance functions on neighborhoods of the image graph

(Section 3.2), and (2) we use the graph to generate a ranked

list that encourages more diverse results (Section 3.3).

3.1. Image Matching Graphs

We construct an image graph for the database using a

standard image matching pipeline [

]: we extract features

from each image, and, for a set of image pairs, ﬁnd nearest

neighbor features and perform RANSAC-based geometric

veriﬁcation. These matches are sufﬁcient for our method

(though to improve the quality of the matching, we can also

run structure from motion to obtain a point cloud and a

reﬁned set of image correspondences). For each image pair

(a, b)

with sufﬁcient inliers matches, we create an edge in

our graph

. We also save the number of inliers

N(a, b)

for each image pair to derive edge weights for the graph. In

our experience, the graphs we compute have very few false

edges—almost all of the matching pairs are correct—though

there may be edges missing from the graph because we do

not exhaustively test all possible edges.

In parts of our algorithm, we will threshold edges by their

weights, discarding all edges below a threshold. The edge

weights we deﬁne are related to the idea of a Jaccard in-

dex; we deﬁne a weight

J(a, b) =

N(a,b)

N(a)+N(b)−N(a,b)

, where

N(a)

and

N(b)

denote the total number of points seen in

and

respectively. This measures the similarity of the

two images as the number of features

N(a, b)

they have

in common, normalized by the union of their feature sets.

This measure ranges from 0 to 1; 0 if no overlap, and 1 if

every feature was matched. This normalization reduces bias

towards images with large numbers of features.

3.2. Graph-based Discriminative Learning

How can we use the information encoded in the graph

to better recognize the location of a query image? We ﬁrst

address this problem as one of distance (or similarity) metric

learning. There are many possible ways to learn a metric

for the images in the graph. For example, one could take

all the connected pairs in the graph to be positive examples

and the other pairs as negative examples, to learn a single,

global distance metric for a speciﬁc dataset [

]. At the other

extreme, one could learn a distance metric for each image in

the database, analogous to how Exemplar SVMs have been

used for object detection [20].

We tried both approaches, but found that we achieved

better performance with approach somewhere in the middle

of these two extremes. In particular, we divide the graph

into a set of overlapping subgraphs, and learn a separate

distance metric for each of these representative subgraphs.

Our approach, then, consists of the following steps:

At Training Time

Compute a covering of the graph with a set of subgraphs.

Learn and calibrate an SVM-based distance metric for

each subgraph.

At Query Time

Use the models in Step 2 to compute the distance from

a query image to each database image, and generate a

ranked shortlist of possible image matches.

Perform geometric veriﬁcation with the top database im-

ages in the shortlist.

We now describe each step in more detail. Later, in Sec-

tion 3.3, we discuss how we improve Step 3 by reranking

the shortlist based on the structure of the graph.

Step 1: Selecting representative neighborhoods.

We start

by covering the graph with a set of representative subgraphs;

afterwards, for each subgraph, we will learn a local similarity

function, using the images in the subgraph as positive exam-

ples, and other, unrelated images in the graph as negative

examples. What makes a good subgraph? We want each

subgraph to contain images that are largely similar, so that

our learning problem has a relatively compact set of positive

example images that can be explained with a simple model.

On the other hand, we also want as many positive examples

as possible, so that our models have enough data from which

to generalize. Finally, we want our subgraphs to completely

cover the graph (i.e., each node is in at least one subgraph),

so that we can build models that apply to any image of the

location modeled in the database.

Based on these criteria, we cover the graph by selecting

a set of representative exemplar images, and deﬁning their

(immediate) neighborhoods as subgraphs in a graph cover,

as illustrated in Figure 1. Formulated this way, the covering

problem becomes one of selecting a set of representative

images that form a dominating set of the graph. For a graph

, and a set of exemplar images

, we say an image

a ∈ I

is covered by

if either

a ∈ C

, or

is adjacent to an image

. If

covers all nodes, then

is a dominating set. We

would like

to be as small as possible, and accordingly, the

neighborhood of each node in

to be as large as possible.

Hence, we seek a minimum dominating set. Such sets have

been used before for 3D reconstruction [

]; here we use

them to deﬁne a set of classiﬁers.

Finding an exact minimum dominating set is an NP-

complete problem. We use a simple greedy algorithm to

ﬁnd an approximate solution [

]. Starting with an empty set,

we iteratively choose the image that covers the maximum

number of as-yet uncovered images in the graph, until all

images are covered. Figure 2 shows an example image graph

for the Dubrovnik dataset [

] and the exemplar images

selected by our algorithm.

Step 2a: Discriminative learning on neighborhoods.

For

each neighborhood selected in Step 1, the next step is to

learn a classiﬁer that will take a new image, and classify it

as belonging to that neighborhood or not. We learn these

classiﬁers using standard linear SVMs on bag-of-words his-

tograms, one for each neighborhood, and calibrate the set

of SVMs as described in Step 2b; at query time, these clas-

siﬁers will be used to deﬁne a set of similarity functions

for ranking the database images given a query image. This

use of classiﬁers for ranking has found many applications in

vision and machine learning, for instance in image retrieval

using local distance functions [8] or Exemplar SVMs [28].

First, for each neighborhood around an exemplar node

c ∈ C

, we must deﬁne a set of positive and negative example

images as training data for the SVM. To deﬁne the positive

set, we simply use the images in the neighborhood. For

this task, we found that thresholding the edges in the graph

by their weight—applying a stricter deﬁnition of connec-

Figure 2.

Image matching graph for the Dubrovnik dataset.

This graph contains 6,844 images; the large, red nodes denote

representative images selected by our covering algorithm (478 im-

ages in total). Although the set of representative images is much

smaller than the entire collection, their neighborhoods cover the

matching graph. For each neighborhood, we learn a classiﬁer for

determining whether a new image belongs to that neighborhood.

tivity, and yielding more compact neighborhoods—yielded

better classiﬁers than using all edges found by the image

matching procedure. To deﬁne the negative set for the neigh-

borhood around an exemplar

, we ﬁrst ﬁnd a small set of

hard negatives—images with high BoW similarities to

, but

not in its neighborhood. These hard negatives are combined

with other randomly sampled non-neighboring images in the

graph to form a negative set. Here we use the original, as

opposed to thresholded, graph to deﬁne connectivity, to mini-

mize the chances of including a false negative in the negative

set. In this way, the image graph

gives us the supervision

necessary to deﬁne positives and negatives for learning, just

as geotags have provided a supervisory cue for discrimina-

tive location recognition in previous work [27, 14].

Given the training data for each neighborhood, we learn

a linear SVM to separate neighborhood images from non-

neighborhood images, using the tf-idf weighted,

normal-

ized bag-of-words histograms for each image as features. We

randomly split the training data into training and validation

subsets for parameter selection in training the SVM (more

details in Section 4.2). For each neighborhood centered on

exemplar

, the result of training is an SVM weight vector

and a bias term

. Given a new query image, represented

as a bag-of-words vector

, we can compute the decision

value w

· q + b

for each exemplar image c.

Query Image 1

Query Image 2

...

Figure 3.

Two example query images and their top 5 ranked

results of our method and raw tf-idf retrieval.

For each result, a

green border indicates a correct match, and a red border indicates

an incorrect match. These two example query images are difﬁcult

for BoW retrieval techniques, due to drastically different lighting

conditions (query image 1) and confusing features (rooftops in

query image 2). However, with our discriminatively learned simi-

larity functions, correctly matching images are ranked higher than

with the baseline method.

Step 2b: Calibrating classiﬁer outputs.

Since our classi-

ﬁers are independently trained, we need to normalize their

outputs before comparing them. To do so, we convert the de-

cision value of each SVM classiﬁer into a probability value,

using Platt’s method [

] on the whole set of training data.

For a neighborhood around exemplar

, and a query image

vector q, we refer to this probability value as P

(q).

Step 3: Generating a ranked list of database images.

For

a query image represented as a BoW vector

, we can now

compute a probability of

belonging to the neighborhood of

each exemplar image c. Using these values, it is straightfor-

ward to generate a ranked list of the exemplar images

c ∈ C

by sorting by

(q)

in decreasing order. However, we found

that just verifying the query image against exemplar images

sometimes failed simply because the exemplar images rep-

resent a much sparser set of viewpoints than the full graph.

Hence, we would like to create a ranked list of all database

images. To do so, we take the sorted set of neighborhoods

given by the probability values, and then we sort the images

within each neighborhood by their original tf-idf similarity.

We then concatenate these per-neighborhood sorted lists;

since a database image can appear in multiple overlapping

neighborhoods (see Figure 1), in the ﬁnal list it appears only

in list of the best-ranked neighborhood. This results in a

ranking of the entire list of database images.

Step 4: Geometric veriﬁcation.

Finally, using the ranking

of database images from Step 3, we perform feature match-

ing and RANSAC-based geometric veriﬁcation between the

query image and each of the images in the shortlist in turn,

until we ﬁnd a true match. If we have a 3D structure from

motion model, we can then associate 3D points with matches

in the query image, and determine its pose [

]. If not, we

can associate the location of the matching database image as

the approximate location of the query image. Because fea-

ture matching and veriﬁcation is relatively computationally

intensive, the quality of the ranking from Step 3 highly im-

pacts the efﬁciency of the system—ideally, a correct match

will be among the top few matches, if not the ﬁrst match.

Using this simple approach, we observe improvements

in our ranked lists over raw BoW retrieval results, as shown

in the examples in Figure 3. In particular, the top image in

the ranked list is more often correct. However, when the

top ranked cluster is incorrect, this method has the effect

of saturating the top shortlist with similar images that are

all wrong—there is a lack of diversity in the list, with the

second-best cluster pushed further down the list. To avoid

this, we propose several methods to encourage a diverse

shortlist of images.

3.3. Improving the Shortlist

In this section, we ﬁrst introduce a probabilistic method

that uses the graph to introduce more diversity into the short-

list, increasing the likelihood of ﬁnding a correct match

among the top few retrieved images. Second, we demon-

strate several techniques to introduce regularization using

BoW ranking to further improve recognition performance.

Probabilistic Reranking.

Our problem is akin to the well-

known Web search ranking problem (as opposed to standard

image retrieval). Rather than retrieve all instances relevant

to a given query, we want to retrieve a small set of results

that are both relevant and diverse (see Figure 4 for an ex-

ample), so as to cover multiple possible hypotheses—just

as a Web search for the term “Michael Jordan” might pro-

ductively return results for both the basketball player and

the machine learning researcher. While introducing diver-

sity in Web search has been studied in the machine learning

literature [

], we are unaware of it being used in location

recognition; in our problem, it is the automatic veriﬁcation

procedure that is examining results, rather than a human.

To introduce diversity, we propose a probabilistic approach

for reranking the shortlist. The idea is, in some ways, the

converse of query expansion on positive matches to increase

recall in image retrieval. In our case, we use negative evi-

dence to increase the pool of diverse matches. For instance,

in the case where the ﬁrst retrieved image is not a match to

the query, we want to select the second image conditioned

on this outcome, perhaps selecting an image dissimilar to

this ﬁrst match (and similarly for the third image conditioned

on the ﬁrst two being incorrect). How can we compute such

conditional probabilities? We again turn to the image graph.

First, some terminology. For a database image

, we

deﬁne a random variable X

representing the event that the

query image matches image

;

= 1

if image

is a

match, and 0 otherwise. Thus, using the notation above,

= P (X

= 1)

for an exemplar image

, and similarly

= P (X

= 1) for any database image, using the simple

heuristic above that a non-exemplar database image takes

the maximum probability of all neighborhoods it belongs

to. As before, we choose the database image

with the

highest

as the top-ranked image. However, to select the

second ranked image, we are instead more interested in the

conditional probability

= P (X

= 1|X

= 0)

than its

raw appearance-based probability

P (X

= 1)

alone. We

can compute this conditional probability as:

= P (X

= 1|X

= 0) =

P (X

= 1, X

= 0)

P (X

= 0)

P (X

= 1) − P (X

= 1, X

= 1)

1 − P (X

= 1)

− P (X

= 1|X

= 1)P (X

= 1)

1 − P

− P

1 − P

= P

1 −

1 − P

(1)

where

= P (X

= 1|X

= 1)

denotes the conditional

probability that image

matches the query given that image

matches. The last line in the derivation above relates

via an update factor,

(1−

)/(1−P

)

, that depends

(the probability that the top ranked image matches)

and

(a conditional probability). We use the image graph

to estimate

, the intuition being that the more similar

is to

—i.e., stronger the connection between

and

in the graph—the higher

should be. In particular, we

estimate

N(a,b)

N(a)

, the ratio of the number of shared

features between

and

divided by the total number of

feature points in

. Note that in general

6≡ P

, i.e., this

similarity measure is asymmetric. These measures are pre-

computed, along with the Jaccard indices

J(a, b)

described

in Section 3.1.

The update factor in Eq. (1) has an intuitive interpretation:

if image

is very similar to image

according to the graph

(i.e.,

is large), then its probability score is downweighted

(because if

is an incorrect match, then

is also likely

incorrect). On the other hand, if

is not connected to

, its

score will tend to be boosted. However, we do not want

to apply this update too quickly, for fear of downweighting

many images based on the evidence of a single mismatch. To

regulate this factor, we introduce a parameter

, and deﬁne

a regularized update factor

(1 − α

)/(1 − αP

)

. If

α = 0

, the update has no inﬂuence on the ranking result,

and when

α = 1

, it has its full effect. We use

α = 0.9

our experiments. We iteratively choose the image

with the

highest updated score P

and recalculate scores using (1).

BoW Regularization.

Our learned discriminative models

often perform well, but we observed that for some rare query

images, our models consistently perform poorly (perhaps due

Graph-Based Discriminative Learning for Location Recognition

Figures

Citations

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization

Image-Based Localization Using LSTMs for Structured Feature Correlation

References

LIBLINEAR: A Library for Large Linear Classification

Video Google: a text retrieval approach to object matching in videos

Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods

Object retrieval with large vocabularies and fast spatial matching

Three things everyone should know to improve object retrieval

Related Papers (5)

Location recognition using prioritized feature matching

Distinctive Image Features from Scale-Invariant Keypoints

City-Scale Location Recognition

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Frequently Asked Questions (12)

Q1. What are the contributions in "Graph-based discriminative learning for location recognition" ?

Q2. What future works have the authors mentioned in the paper "Graph-based discriminative learning for location recognition" ?

Q3. What is the main contribution of the approach?

Q4. What is the way to use the simple fall back strategy?

Q5. What is the way to regularize the ranking of query images?

Q6. what is the conditional probability of a picture matching?

Q7. what is the conditional probability that image b matches the query?

Q8. What is the way to improve the quality of the matching?

Q9. what is the probability of image b matching the query?

Q10. Why do the authors believe that the global ranking is contaminated by the low-degree nodes?

Q11. What is the way to determine the pose of the query image?

Q12. How do the authors use this information to verify the location of a query image?