What are the contributions in "Group nearest neighbor queries" ?

Assuming that Q fits in memory and P is indexed by an R-tree, the authors propose several algorithms for finding the group nearest neighbors efficiently. As a second step, the authors extend their techniques for situations where Q can not fit in memory, covering both indexed and non-indexed query points.

What future works have the authors mentioned in the paper "Group nearest neighbor queries" ?

In the future the authors intend to explore the application of related techniques to variations of group nearest neighbor search. Furthermore, it would be interesting to study other distance metrics ( e. g., network distance ) that necessitate alternative pruning heuristics and algorithms. Additional constraints ( e. g., a facility may serve at most k users ) may further complicate the solutions.

How can the authors prune a leaf node?

For nodes the authors use the weighted mindist, based on the intuition that nodes with small values are likely to lead to neighbors with small global distance, so that subsequent visits can be pruned by heuristic 5.

What is the heuristic for storing the qualifying list?

The authors store the qualifying list as an in-memory hash table on point ids to facilitate the retrieval of information (i.e., counter(pi), curr_dist(pi)) about particular points (pi).

What is the cost of varying the relative workspaces of the two datasets?

Since now the query cardinality n is fixed to that of the corresponding dataset, the authors perform experiments by varying the relative workspaces of the two datasets.

What is the distance between a data point p and qi?

The distance between a data point p and Q is defined as dist(p,Q)=∑i=1~n|pqi|, where |pqi| is the Euclidean distance between p and query point qi.

What is the way to avoid the same computations?

A possible optimization is to keep each NN in memory, together with its distances to all groups, so that the authors avoid these computations if the same point is encountered later through another group.

(Open Access) Group nearest neighbor queries (2004) | Dimitris Papadias

Q: How does MQM achieve locality of the nodes?

In order to achieve locality of the node accesses for individual queries, the authors sort the points in Q according to their Hilbert value; thus, two subsequent queries are likely to correspond to nearby points and access similar R-tree nodes.

Group Nearest Neighbor Queries

Dimitris Papadias

†

Qiongmao Shen

†

Yufei Tao

Kyriakos Mouratidis

†

Department of Computer Science

Hong Kong University of Science and Technology

Clear Water Bay, Hong Kong

{dimitris, qmshen, kyriakos}@cs.ust.hk

Department of Computer Science

City University of Hong Kong

Tat Chee Avenue, Hong Kong

taoyf@cs.cityu.edu.hk

Abstract

Given two sets of points P and Q, a group nearest neighbor

(GNN) query retrieves the point(s) of P with the smallest

sum of distances to all points in Q. Consider, for instance,

three users at locations q

, q

and q

that want to find a

meeting point (e.g., a restaurant); the corresponding query

returns the data point p that minimizes the sum of Euclidean

distances |pq

| for 1≤i≤3. Assuming that Q fits in memory

and P is indexed by an R-tree, we propose several

algorithms for finding the group nearest neighbors

efficiently. As a second step, we extend our techniques for

situations where Q cannot fit in memory, covering both

indexed and non-indexed query points. An experimental

evaluation identifies the best alternative based on the data

and query properties.

1. Introduction

Nearest neighbor (NN) search is one of the oldest problems

in computer science. Several algorithms and theoretical

performance bounds have been devised for exact and

approximate processing in main memory [S91, AMN+98].

Furthermore, the application of NN search to content-based

and similarity retrieval has led to the development of

numerous cost models [PM97, WSB98, BGRS99, B00] and

indexing techniques [SYUK00, YOTJ01] for high-

dimensional versions of the problem. In spatial databases

most of the work has focused on the point NN query that

retrieves the k (≥1) objects from a dataset P that are closest

(usually according to Euclidean distance) to a query point

q. The existing algorithms (reviewed in Section 2) assume

that P is indexed by a spatial access method and utilize

some pruning bounds to restrict the search space. Shahabi

et al. [SKS02] and Papadias et al. [PZMT03] deal with

nearest neighbor queries in spatial network databases,

where the distance between two points is defined as the

length of the shortest path connecting them in the network.

In addition to conventional (i.e., point) NN queries, recently

there has been an increasing interest in alternative forms of

spatial and spatio-temporal NN search. Ferhatosmanoglu et

al. [FSAA01] discover the NN in a constrained area of the

data space. Korn and Muthukrishnan [KM00] discuss

reverse nearest neighbor queries, where the goal is to

retrieve the data points whose nearest neighbor is a

specified query point. Korn et al. [KMS02] study the same

problem in the context of data streams. Given a query

moving with steady velocity, [SR01, TP02] incrementally

maintain the NN (as the query moves), while [BJKS02,

TPS02] propose techniques for continuous NN processing,

where the goal is to return all results up to a future time.

Kollios et al. [KGT99] develop various schemes for

answering NN queries on 1D moving objects. An overview

of existing NN methods for spatial and spatio-temporal

databases can be found in [TP03].

In this paper we discuss group nearest neighbor (GNN)

queries, a novel form of NN search. The input of the

problem consists of a set P={p

,…,p

} of static data points

in multidimensional space and a group of query points

Q={q

,…,q

}. The output contains the k (≥1) data point(s)

with the smallest sum of distances to all points in Q. The

distance between a data point p and Q is defined as

dist(p,Q)=

∑

i=1~n

|pq

|, where |pq

| is the Euclidean distance

between p and query point q

. As an example consider a

database that manages (static) facilities (i.e., dataset P). The

query contains a set of user locations Q={q

,…,q

} and the

result returns the facility that minimizes the total travel

distance for all users. In addition to its relevance in

geographic information systems and mobile computing

applications, GNN search is important in several other

domains. For instance, in clustering [JMF99] and outlier

detection [AY01], the quality of a solution can be evaluated

by the distances between the points and their nearest cluster

centroid. Furthermore, the operability and speed of very

large circuits depends on the relative distance between the

various components in them. GNN can be applied to detect

abnormalities and guide relocation of components [NO97].

Assuming that Q fits in memory and P is indexed by an R-

tree, we first propose three algorithms for solving this

problem. Then, we extend our techniques for cases that Q is

too large to fit in memory, covering both indexed and non-

indexed query points. The rest of the paper is structured as

follows. Section 2 outlines the related work on conventional

nearest neighbor search and top-k queries. Section 3

describes algorithms for the case that Q fits in memory and

Section 4 for the case that Q resides on the disk. Section 5

experimentally evaluates the algorithms and identifies the

best one depending on the problem characteristics. Section

6 concludes the paper with directions for future work.

2. Related work

Following most approaches in the relevant literature, we

assume 2D data points indexed by an R-tree [G84]. The

proposed techniques, however, are applicable to higher

dimensions and other data-partition access methods such as

A-trees [SYUK00] etc. Figure 2.1 shows an R-tree for point

set P={p

,…,p

} assuming a capacity of three entries

per node. Points that are close in space (e.g., p

, p

) are

clustered in the same leaf node (N

). Nodes are then

recursively grouped together with the same principle until

the top level, which consists of a single root.

Existing algorithms for point NN queries using R-trees

follow the branch-and-bound paradigm, utilizing some

metrics to prune the search space. The most common such

metric is mindist(N,q), which corresponds to the closest

possible distance between q and any point in the subtree of

node N. Figure 2.1a shows the mindist between point q and

nodes N

, N

. Similarly, mindist(N

) is the minimum

possible distance between any two points that reside in the

sub-trees of nodes N

and N

(a) Points and node extents (b) The corresponding R-tree

Figure 2.1: Example of an R-tree and a point NN query

The first NN algorithm for R-trees [RKV95] searches the

tree in a depth-first (DF) manner. Specifically, starting from

the root, it visits the node with the minimum

mindist from q

(e.g.,

in Figure 2.1). The process is repeated recursively

until the leaf level (node

), where the first potential

nearest neighbor is found (

). During backtracking to the

upper level (node

), the algorithm only visits entries

whose minimum distance is smaller than the distance of the

nearest neighbor already retrieved. In the example of Figure

2.1, after discovering

, DF will backtrack to the root level

(without visiting

), and then follow the path N

where

the actual NN

is found.

The DF algorithm is sub-optimal, i.e., it accesses more

nodes than necessary. In particular, as proven in [PM97], an

optimal algorithm should visit only nodes intersecting the

vicinity circle that centers at the query point q and has

radius equal to the distance between

q and its nearest

neighbor. In Figure 2.1a, for instance, an optimal algorithm

should visit only nodes

R, N

, N

, and N

(whereas DF also

visits

). The best-first (BF) algorithm of [HS99] achieves

the optimal I/O performance by maintaining a heap

H with

the entries visited so far, sorted by their

mindist. As with

DF, BF starts from the root, and inserts all the entries into

H (together with their mindist), e.g., in Figure 2.1a,

H={<N

, mindist(N

,q)>, <N

, mindist(N

,q)>}. Then, at

each step, BF visits the node in

H with the smallest mindist.

Continuing the example, the algorithm retrieves the content

and inserts all its entries in H, after which H={<N

mindist(N

,q)>, <N

, mindist(N

,q)>, <N

, mindist(N

,q)>}.

Similarly, the next two nodes accessed are

and N

(inserted in

H after visiting N

), in which p

is discovered

as the current NN. At this time, the algorithm terminates

(with

as the final result) since the next entry (N

) in H is

farther (from

q) than p

. Both DF and BF can be easily

extended for the retrieval of

k>1 nearest neighbors. In

addition, BF is also

incremental. Namely, it reports the

nearest neighbors in ascending order of their distance to the

query, so that

k does not have to be known in advance

(allowing different termination conditions to be used).

The branch-and-bound framework also applies to

closest

pair queries that find the pair of objects from two datasets,

such that their distance is the minimum among all pairs.

[HS98, CMTV00] propose various algorithms based on the

concepts of DF and BF traversal. The difference from NN

is that the algorithms access two index structures (one for

each data set) simultaneously. If the

mindist of two

intermediate nodes

and N

(one from each R-tree) is

already greater than the distance of the closest pair of

objects found so far, the sub-trees of

and N

cannot

contain a closest pair (thus, the pair is pruned).

As shown in the next section, a processing technique for

GNN queries applies multiple conventional NN queries

(one for each query point) and then combines their results.

Some related work on this topic has appeared in the

literature of top-

k (or ranked) queries over multiple data

repositories (see [FLN01, BCG02, F02] for representative

papers). As an example, consider that a user wants to find

the

k images that are most similar to a query image, where

similarity is defined according to

n features, e.g., color

histogram, object arrangement, texture, shape etc. The

query is submitted to

n retrieval engines that return the best

matches for particular features together with their similarity

scores, i.e., the first engine will output a set of matches

according to color, the second according to arrangement

and so on. The problem is to combine the multiple inputs in

order to determine the top-

k results in terms of their overall

similarity.

The main idea behind all techniques is to minimize the

extent and cost of search performed on each retrieval

engine in order to compute the final result. The

threshold

algorithm [FLN01] works as follows (assuming retrieval of

the single best match): the first query is submitted to the

first search engine, which returns the closest image

according to the first feature. The similarity between

and

the query image with respect to the other features is

computed. Then, the second query is submitted to the

second search engine, which returns

(best match

according to the second feature). The overall similarity of

is also computed, and the best of p

and p

becomes the

current result. The process is repeated in a round-robin

fashion, i.e., after the last search engine is queried, the

second match is retrieved with respect to the first feature

and so on. The algorithm will terminate when the similarity

of the current result is higher than the similarity that can be

achieved by any subsequent solution. In the next section

we adapt this approach to GNN processing.

3. Algorithms for memory-resident queries

Assuming that the set Q of query points fits in memory and

that the data points are indexed by an R-tree, we present

three algorithms for processing GNN queries. For each

algorithm we first illustrate retrieval of a single nearest

neighbor, and then show the extension to

k>1. Table 3.1

contains the primary symbols used in our description (some

have not appeared yet, but will be clarified shortly).

Symbol Description

Q set of query points

a group of queries that fits in memory

) number of queries in Q (Q

)

M (M

) MBR of Q (Q

)

q centroid of Q

dist(p,Q) sum of distances between

point p and query points in Q

mindist(N,q) minimum distance between

MBR of node N and centroid q

mindist(p,M) minimum distance between

data point p and query MBR M

()

mindist N M⋅

∑

weighted mindist of node N

with respect to all query groups

Table 3.1: Frequently used symbols

3.1 Multiple query method

The

multiple query method (MQM) utilizes the main idea

of the

threshold algorithm, i.e., it performs incremental NN

queries for each point in

Q and combines their results. For

instance, in Figure 3.1 (where

Q ={q

}), MQM retrieves

the first NN of

(point p

with |p

|=2) and computes

the distance |

| (=5). Similarly, it finds the first NN of q

(point

with |p

|=3) and computes |p

|(=3). The

point (

) with the minimum sum of distances

|+|p

|=6) to all query points becomes the current

GNN of

For each query point

, MQM stores a threshold t

, which is

the distance of the current NN, i.e.,

=|p

|=2 and

=|p

|=3. The total threshold T is defined as the sum of

all thresholds (=5). Continuing the example, since

T <

dist

,Q), it is possible that there exists a point in P whose

distance to

Q is smaller than dist(p

,Q). So MQM retrieves

the second NN of

, which has already been

encountered by

) and updates the threshold t

to |p

(=3). Since

T (=6) now equals the summed distance

between the best neighbor found so far and the points of

MQM terminates with

as the final result. In other words,

every non-encountered point has distance greater or equal

T (=6), and therefore it cannot be closer to Q (in the

global sense) than

Figure 3.1: Example of a GNN query

Figure 3.2 shows the pseudo code for MQM (1NN), where

best_dist (initially ∞) is the distance of the best_NN found

so far. In order to achieve locality of the node accesses for

individual queries, we sort the points in

Q according to their

Hilbert value; thus, two subsequent queries are likely to

correspond to nearby points and access similar R-tree

nodes. The algorithm for computing nearest neighbors of

query points should be incremental (e.g., best-first search

discussed in Section 2) because the termination condition is

not known in advance. The extension for the retrieval of

(>1) nearest neighbors is straightforward. The

k neighbors

with the minimum overall distances are inserted in a list of

k pairs <p, dist(p,Q)> (sorted on dist(p,Q)) and best_dist

equals the distance of the

k-th NN. Then, MQM proceeds in

the same way as in Figure 3.2, except that whenever a better

neighbor is found, it is inserted in

best_NN and the last

element of the list is removed.

MQM(Q: group of query points)

/* T : threshold ; best_dist distance of the current NN*/

sort points in Q according to Hilbert value;

for each query point: t

=0;

T=0; best_dist=∞; best_NN=null; //Initialization

while (T < best_dist)

get the next nearest neighbor p

of the next query point q

;

= |p

|; update T;

if dist(p

,Q)<best_dist

best_NN =p

; //Update current GNN of Q

best_dist = dist(p

,Q) ;

end of while;

return best_NN;

Figure 3.2: The MQM algorithm

3.2 Single point method

MQM may incur multiple accesses to the same node (and

retrieve the same data point, e.g.,

) through different

queries. To avoid this problem, the

single point method

(SPM) processes GNN queries by a single traversal. First,

SPM computes the

centroid q of Q, which is a point in

space with a small value of

dist(q,Q) (ideally, q is the point

with the minimum

dist(q,Q)). The intuition behind this

approach is that the nearest neighbor is a point of

P "near"

q. It remains to derive (i) the computation of q, and (ii) the

range around

q in which we should look for points of P,

before we conclude that no better NN can be found.

Towards the first goal, let (

x,y) be the coordinates of

centroid

q and (x

) be the coordinates of query point q

The centroid

q minimizes the distance function:

(, ) (- ) ( )

ist q Q x x y y

=+−

∑

Since the partial derivatives of function dist(q,Q) with

respect to its independent variables x and y are zero at the

centroid q, we have the following equations:

(, )

()()

(, )

()()

dist q Q

xx yy

dist q Q

xx yy

−

∂





∂

−+−





−

∂



∂

−+−



∑

Unfortunately, the above equations cannot be solved into

closed form for n>2, or in other words, they must be

evaluated numerically, which implies that the centroid is

approximate. In our implementation, we use the gradient

descent [HYC01] method to quickly obtain a good

approximation. Specifically, starting with some arbitrary

initial coordinates, e.g. x=(1/n)

∑

i=1~n

and, y=(1/n)

∑

i=1~n

the method modifies the coordinates as follows:

(, )

ist q Q

∂

=−

∂

and

(, )

ist q Q

∂

=−

∂

where

is a step size. The process is repeated until the

distance function dist(q,Q) converges to a minimum value.

Although the resulting point q is only an approximation of

the ideal centroid, it suffices for the purposes of SPM. Next

we show how q can be used to prune the search space based

on the following lemma.

Lemma 1: Let Q={q

,…,q

} be a group of query points and

q an arbitrary point in space. The following inequality holds

for any point p: dist(p,Q) ≥ n

⋅

|p q| - dist(q,Q), where |pq|

denotes the Euclidean distance between p and q.

Proof: Due to the triangular inequality, for each query point

we have that: |pq

|+|q

q|≥|pq|. By summing up the n

inequalities:

||||| (,)||-(,

)

qQ qQ

q q q n pq dist p Q n pq dist q Q

∈∈

+≥⋅

⇒

≥⋅

∑∑

Lemma 1 provides a threshold for the termination of SPM.

In particular, by applying an incremental point NN query at

q, we stop when we find the first point p such that: n

⋅

|pq| −

dist(q,Q) ≥ dist(best_NN,Q). By Lemma 1, dist(p,Q) ≥

⋅

|pq|

−

dist(q,Q) and, therefore, dist(p,Q) ≥ dist(best_NN,Q).

The same idea can be used for pruning intermediate nodes,

as summarized by the following heuristic.

Heuristic 1: Let q be the centroid of Q and best_dist be the

distance of the best GNN found so far. Node N can be

pruned if:

+()

(,)

est_dist dist q,Q

mindist N q

≥

where mindist(N,q) is the minimum distance between the

MBR of N and the centroid q. An example of the heuristic

is shown in Figure 3.3, where the best_dist = 5+4. Since,

dist(q,Q)=1+2, the right part of the inequality equals 6,

meaning that both nodes in the figure will be pruned.

Figure 3.3: Pruning of nodes in SPM

Based on the above observations, it is straightforward to

implement SPM using the depth-first or best-first

paradigms. Figure 3.4 shows the pseudo-code of DF SPM.

Starting from the root of the R-tree (for P), entries are

sorted in a list according to their mindist from the query

centroid q and are visited (recursively) in this order. Once

the first entry with mindist(N

,q) ≥ (best_dist+dist(q,Q))/n

has been found, the subsequent ones in the list are pruned.

The extension to k (>1) GNN queries is the same as

conventional (point) NN algorithms.

SPM(Node: R-tree node, Q: group of query points)

/* q: the centroid of Q*/

if Node is an intermediate node

sort entries N

in Node according to mindist(N

,q) in list;

repeat

get_next entry N

from list;

if mindist(N

,q)< (best_dist+dist(q,Q))/n; /* Heuristic 1

SPM(N

,Q); /* recursion*/

until mindist(N

,q) ≥ (best_dist+dist(q,Q))/n or end of list;

else if Node is a leaf node

sort points p

in Node according to mindist(p

,q) in list;

repeat

get_next entry p

from list;

if |p

q|<(best_dist+dist(q,Q))/n; /* Heuristic 1 for points

if dist(p

,Q)< best_dist

best_NN =p

; //Update current GNN

best_dist = dist(p

,Q) ;

until |p

q|≥ (best_dist+dist(q,Q))/n or end of list;

return best_NN;

Figure 3.4: The SPM algorithm

3.3 Minimum bounding method

Like SPM, the minimum bounding method (MBM)

performs a single query, but uses the minimum bounding

rectangle M of Q (instead of the centroid q) to prune the

search space. Specifically, starting from the root of the R-

tree for dataset P, MBM visits only nodes that may contain

candidate points. In the sequel, we discuss heuristics for

identifying such qualifying nodes.

Heuristic 2: Let M be the MBR of Q, and best_dist be the

distance of the best GNN found so far. A node N cannot

contain qualifying points, if:

(, )

est_di

mindist N M

≥

where mindist(N,M) is the minimum distance between M

and N, and n is the cardinality of Q. Figure 3.5 shows a

group of query points Q={q

} and the best_NN with

best_dist=5. Since mindist(N

,M) = 3 > best_dist/2 = 2.5,

can be pruned without being visited. In other words,

even if there is a data point p at the upper-right corner of N

and all the query points were at the lower right corner of Q,

it would still be the case that dist(p,Q)> best_dist. The

concept of heuristic 2 also applies to the leaf entries. When

a point p is encountered, we first compute mindist(p,M)

from p to the MBR of Q. If mindist(p,M)

≥

best_dist/n, p is

discarded since it cannot be closer than the best_NN. In this

way we avoid performing the distance computations

between p and the points of Q.

Figure 3.5: Example of heuristic 2

The heuristic incurs minimum overhead, since for every

node it requires a single distance computation. However, it

is not very tight, i.e., it leads to unnecessary node accesses.

For instance, node N

(in Figure 3.5) passes heuristic 2 (and

should be visited), although it cannot contain qualifying

points. Heuristic 3 presents a tighter bound for avoiding

such visits.

Heuristic 3: Let best_dist be the distance of the best GNN

found so far. A node N can be safely pruned if:

(,)

indist N q best_di

∈

≥

∑

where mindist(N,q

) is the minimum distance between N and

query point q

∈ Q. In Figure 3.5, since mindist(N

, q

) +

mindist(N

, q

) = 6 > best_dist = 5, N

is pruned.

Because heuristic 3 requires multiple distance computations

(one for each query point) it is applied only for nodes that

pass heuristic 2. Note that (like heuristic 2) heuristic 3 does

represent the tightest condition for successful node visits;

i.e., it is possible for a node to satisfy the heuristic and still

not contain qualifying points. Consider, for instance, Figure

3.6, which includes 3 query points. The current best_dist is

7, and node N

passes heuristic 3, since mindist(N

) +

mindist(N

) + mindist(N

) = 5. Nevertheless, N

should not be visited, because the minimum distance that

can be achieved by any point in N

is greater than 7. The

dotted lines in Figure 3.6 correspond to the distance

between the best possible point p' (not necessarily a data

point) in N

and the three query points.

Figure 3.6: Example of a hypothetical optimal heuristic

Assuming that we can identify the best point p' in the node,

we can obtain a tight heuristic a follows: if the distance of

p' is smaller than best_dist visit the node; otherwise, reject

it. The combination of the best-first approach with this

heuristic would lead to an I/O optimal method (such as the

algorithm of [HS99] for conventional NN queries). Finding

point p', however, is similar to the problem of locating the

query centroid (but this time in a region constrained by the

node MBR), which, as discussed in Section 3.2, can only be

solved numerically (i.e., approximately). Although an

approximation suffices for SPM, for the correctness of

best_dist it is necessary to have the precise solution (in

order to avoid false misses). As a result, this hypothetical

heuristic cannot be applied for exact GNN retrieval.

Heuristics 2 and 3 can be used with both the depth-first and

best-first traversal paradigms. For simplicity, we discuss

MBM based on depth-fist traversal using the example of

Figure 3.7. The root of the R-tree is retrieved and its entries

are sorted by their mindist to M. Then, the node (N

) with

the minimum mindist is visited, inside which the entry of N

has the smallest mindist. Points p

, p

(in N

) are

processed according to the value of mindist(p

,M) and p

becomes the current GNN of Q (best_dist=11). Points p

and p

have larger distances and are discarded. When

backtracking to N

, the subtree of N

is pruned by heuristic

2. Thus, MBM backtracks again to the root and visits nodes

and N

, inside which p

has the smallest mindist to M

and is processed first, replacing p

as the GNN

(best_dist=7). Then, p

becomes the best NN

(best_dist=6). Finally, N

is pruned by heuristic 2, and the

algorithm terminates with p

as the final GNN. The

extension to retrieval of kNN and the best-first

implementation are straightforward.

Group nearest neighbor queries

Figures

Citations

When is nearest neighbor meaningful

The new Casper: query processing for location services without compromising privacy

Monitoring k-nearest neighbor queries over moving objects

Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring

On trip planning queries in spatial databases

References

Data clustering: a review

R-trees: a dynamic index structure for spatial searching

The R*-tree: an efficient and robust access method for points and rectangles

An optimal algorithm for approximate nearest neighbor searching fixed dimensions

When Is ''Nearest Neighbor'' Meaningful?

Related Papers (5)

R-trees: a dynamic index structure for spatial searching

Nearest neighbor queries

Distance browsing in spatial databases

The R*-tree: an efficient and robust access method for points and rectangles

Influence sets based on reverse nearest neighbor queries

Frequently Asked Questions (13)

Q1. What are the contributions in "Group nearest neighbor queries" ?

Q2. What future works have the authors mentioned in the paper "Group nearest neighbor queries" ?

Q3. How can the authors prune a leaf node?

Q4. What is the common metric used to prune the search space?

Q5. Why is heuristic 3 used for a node?

Q6. What is the heuristic for storing the qualifying list?

Q7. What is the algorithm for retrieving a query?

Q8. What is the cost of varying the relative workspaces of the two datasets?

Q9. How does MQM achieve locality of the nodes?

Q10. What is the distance between a data point p and qi?

Q11. What is the way to avoid the same computations?

Q12. What is the way to find the k images that are similar to a query?

Q13. How does BF achieve the optimal I/O performance?