scispace - formally typeset
Proceedings ArticleDOI

K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free

Reads0
Chats0
TLDR
This work designs algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan.
Abstract
Finding the k nearest neighbors (kNN) of a query point, or a set of query points (kNN-Join) are fundamental problems in many application domains. Many previous efforts to solve these problems focused on spatial databases or stand-alone systems, where changes to the database engine may be required, which may limit their application on large data sets that are stored in a relational database management system. Furthermore, these methods may not automatically optimize kNN queries or kNN-Joins when additional query conditions are specified. In this work, we study both the kNN query and the kNN-Join in a relational database, possibly augmented with additional query conditions. We search for relational algorithms that require no changes to the database engine. The straightforward solution uses the user-defined-function (UDF) that a query optimizer cannot optimize.We design algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan. Using only a small constant number of random shifts for databases in any fixed dimension, our approach guarantees to find the approximate kNN with only logarithmic number of page accesses in expectation with a constant approximation ratio and it could be extended to find the exact kNN efficiently in any fixed dimension. Our design paradigm easily supports the kNN-Join and updates. Extensive experiments on large, real and synthetic, data sets confirm the efficiency and practicality of our approach.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Efficient parallel kNN joins for large data in MapReduce

TL;DR: This work proposes novel (exact and approximate) algorithms in MapReduce to perform efficient parallel kNN joins on large data to meet many practical needs in data mining applications and spatial and multimedia databases.
Posted Content

Efficient Processing of k Nearest Neighbor Joins using MapReduce

TL;DR: This paper investigates how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers and designs an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs.
Journal ArticleDOI

Efficient processing of k nearest neighbor joins using MapReduce

TL;DR: Zhang et al. as discussed by the authors investigated how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers.

Open Street Map

TL;DR: OpenStreetMap kreira i pruža slobodne geografske podatke, kao sto su karte gradova i naselja svakome tko ih želi.
Journal ArticleDOI

Design and analysis of a ranking approach to private location-based services

TL;DR: The article's proposal, SpaceTwist, aims to offer location privacy for k nearest neighbor (kNN) queries at low communication cost without requiring a trusted anonymizer and is believed to be the first solution that expresses the server-side functionality in a single SQL statement.
References
More filters
Proceedings ArticleDOI

R-trees: a dynamic index structure for spatial searching

TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.
Proceedings ArticleDOI

The R*-tree: an efficient and robust access method for points and rectangles

TL;DR: The R*-tree is designed which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory which clearly outperforms the existing R-tree variants.
Proceedings Article

Similarity Search in High Dimensions via Hashing

TL;DR: Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition.
Journal ArticleDOI

An optimal algorithm for approximate nearest neighbor searching fixed dimensions

TL;DR: In this paper, it was shown that given an integer k ≥ 1, (1 + ϵ)-approximation to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Proceedings ArticleDOI

Nearest neighbor queries

TL;DR: This paper presents an efficient branch-and-bound R-tree traversal algorithm to find the nearest neighbor object to a point, and then generalizes it to finding the k nearest neighbors.