Proceedings ArticleDOI
K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free
Bin Yao,Feifei Li,Piyush Kumar +2 more
- pp 4-15
Reads0
Chats0
TLDR
This work designs algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan.Abstract:
Finding the k nearest neighbors (kNN) of a query point, or a set of query points (kNN-Join) are fundamental problems in many application domains. Many previous efforts to solve these problems focused on spatial databases or stand-alone systems, where changes to the database engine may be required, which may limit their application on large data sets that are stored in a relational database management system. Furthermore, these methods may not automatically optimize kNN queries or kNN-Joins when additional query conditions are specified. In this work, we study both the kNN query and the kNN-Join in a relational database, possibly augmented with additional query conditions. We search for relational algorithms that require no changes to the database engine. The straightforward solution uses the user-defined-function (UDF) that a query optimizer cannot optimize.We design algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan. Using only a small constant number of random shifts for databases in any fixed dimension, our approach guarantees to find the approximate kNN with only logarithmic number of page accesses in expectation with a constant approximation ratio and it could be extended to find the exact kNN efficiently in any fixed dimension. Our design paradigm easily supports the kNN-Join and updates. Extensive experiments on large, real and synthetic, data sets confirm the efficiency and practicality of our approach.read more
Citations
More filters
Proceedings ArticleDOI
Efficient parallel kNN joins for large data in MapReduce
TL;DR: This work proposes novel (exact and approximate) algorithms in MapReduce to perform efficient parallel kNN joins on large data to meet many practical needs in data mining applications and spatial and multimedia databases.
Posted Content
Efficient Processing of k Nearest Neighbor Joins using MapReduce
TL;DR: This paper investigates how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers and designs an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs.
Journal ArticleDOI
Efficient processing of k nearest neighbor joins using MapReduce
TL;DR: Zhang et al. as discussed by the authors investigated how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers.
Open Street Map
TL;DR: OpenStreetMap kreira i pruža slobodne geografske podatke, kao sto su karte gradova i naselja svakome tko ih želi.
Journal ArticleDOI
Design and analysis of a ranking approach to private location-based services
TL;DR: The article's proposal, SpaceTwist, aims to offer location privacy for k nearest neighbor (kNN) queries at low communication cost without requiring a trusted anonymizer and is believed to be the first solution that expresses the server-side functionality in a single SQL statement.
References
More filters
Proceedings ArticleDOI
R-trees: a dynamic index structure for spatial searching
TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.
Proceedings ArticleDOI
The R*-tree: an efficient and robust access method for points and rectangles
TL;DR: The R*-tree is designed which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory which clearly outperforms the existing R-tree variants.
Proceedings Article
Similarity Search in High Dimensions via Hashing
TL;DR: Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition.
Journal ArticleDOI
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
TL;DR: In this paper, it was shown that given an integer k ≥ 1, (1 + ϵ)-approximation to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Proceedings ArticleDOI
Nearest neighbor queries
TL;DR: This paper presents an efficient branch-and-bound R-tree traversal algorithm to find the nearest neighbor object to a point, and then generalizes it to finding the k nearest neighbors.