Group nearest neighbor queries
read more
Citations
When is nearest neighbor meaningful
The new Casper: query processing for location services without compromising privacy
Monitoring k-nearest neighbor queries over moving objects
Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring
On trip planning queries in spatial databases
References
Data clustering: a review
R-trees: a dynamic index structure for spatial searching
The R*-tree: an efficient and robust access method for points and rectangles
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
When Is ''Nearest Neighbor'' Meaningful?
Related Papers (5)
Frequently Asked Questions (13)
Q2. What future works have the authors mentioned in the paper "Group nearest neighbor queries" ?
In the future the authors intend to explore the application of related techniques to variations of group nearest neighbor search. Furthermore, it would be interesting to study other distance metrics ( e. g., network distance ) that necessitate alternative pruning heuristics and algorithms. Additional constraints ( e. g., a facility may serve at most k users ) may further complicate the solutions.
Q3. How can the authors prune a leaf node?
For nodes the authors use the weighted mindist, based on the intuition that nodes with small values are likely to lead to neighbors with small global distance, so that subsequent visits can be pruned by heuristic 5.
Q4. What is the common metric used to prune the search space?
Existing algorithms for point NN queries using R-trees follow the branch-and-bound paradigm, utilizing some metrics to prune the search space.
Q5. Why is heuristic 3 used for a node?
Because heuristic 3 requires multiple distance computations (one for each query point) it is applied only for nodes that pass heuristic 2.
Q6. What is the heuristic for storing the qualifying list?
The authors store the qualifying list as an in-memory hash table on point ids to facilitate the retrieval of information (i.e., counter(pi), curr_dist(pi)) about particular points (pi).
Q7. What is the algorithm for retrieving a query?
The query is submitted to n retrieval engines that return the best matches for particular features together with their similarity scores, i.e., the first engine will output a set of matches according to color, the second according to arrangement and so on.
Q8. What is the cost of varying the relative workspaces of the two datasets?
Since now the query cardinality n is fixed to that of the corresponding dataset, the authors perform experiments by varying the relative workspaces of the two datasets.
Q9. How does MQM achieve locality of the nodes?
In order to achieve locality of the node accesses for individual queries, the authors sort the points in Q according to their Hilbert value; thus, two subsequent queries are likely to correspond to nearby points and access similar R-tree nodes.
Q10. What is the distance between a data point p and qi?
The distance between a data point p and Q is defined as dist(p,Q)=∑i=1~n|pqi|, where |pqi| is the Euclidean distance between p and query point qi.
Q11. What is the way to avoid the same computations?
A possible optimization is to keep each NN in memory, together with its distances to all groups, so that the authors avoid these computations if the same point is encountered later through another group.
Q12. What is the way to find the k images that are similar to a query?
As an example, consider that a user wants to find the k images that are most similar to a query image, where similarity is defined according to n features, e.g., color histogram, object arrangement, texture, shape etc.
Q13. How does BF achieve the optimal I/O performance?
The best-first (BF) algorithm of [HS99] achieves the optimal I/O performance by maintaining a heap H with the entries visited so far, sorted by their mindist.