K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free

doi:10.1109/ICDE.2010.5447837

Proceedings ArticleDOI

K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free

Bin Yao, +2 more

- pp 4-15

Chats0

TLDR

This work designs algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan.

Abstract:

Finding the k nearest neighbors (kNN) of a query point, or a set of query points (kNN-Join) are fundamental problems in many application domains. Many previous efforts to solve these problems focused on spatial databases or stand-alone systems, where changes to the database engine may be required, which may limit their application on large data sets that are stored in a relational database management system. Furthermore, these methods may not automatically optimize kNN queries or kNN-Joins when additional query conditions are specified. In this work, we study both the kNN query and the kNN-Join in a relational database, possibly augmented with additional query conditions. We search for relational algorithms that require no changes to the database engine. The straightforward solution uses the user-defined-function (UDF) that a query optimizer cannot optimize.We design algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan. Using only a small constant number of random shifts for databases in any fixed dimension, our approach guarantees to find the approximate kNN with only logarithmic number of page accesses in expectation with a constant approximation ratio and it could be extended to find the exact kNN efficiently in any fixed dimension. Our design paradigm easily supports the kNN-Join and updates. Extensive experiments on large, real and synthetic, data sets confirm the efficiency and practicality of our approach.

K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free

Citations

Efficient parallel kNN joins for large data in MapReduce

Efficient Processing of k Nearest Neighbor Joins using MapReduce

Efficient processing of k nearest neighbor joins using MapReduce

Open Street Map

Design and analysis of a ranking approach to private location-based services

References

R-trees: a dynamic index structure for spatial searching

The R*-tree: an efficient and robust access method for points and rectangles

Similarity Search in High Dimensions via Hashing

An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Nearest neighbor queries

Related Papers (5)

Efficient parallel kNN joins for large data in MapReduce

MapReduce: simplified data processing on large clusters

Nearest neighbor queries

R-trees: a dynamic index structure for spatial searching

Efficient parallel set-similarity joins using MapReduce