scispace - formally typeset
Open AccessProceedings Article

Near Neighbor Search in Large Metric Spaces

Sergey Brin
- pp 574-584
Reads0
Chats0
TLDR
A data structure to solve the problem of finding approximate matches in a large database called a GNAT { Geometric Near-neighbor Access Tree} is introduced based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of theData that does not use its intrinsic geometry.
Abstract
Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically diffcult case where each data element is high dimensional, or more generally, is represented by a point in a large metric spaceand distance calculations are computationally expensive. In this paper we introduce a data structure to solve this problem called a GNAT { Geometric Near-neighbor Access Tree. It is based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of the data that does not use its intrinsic geometry. In experiments, we find that GNAT's outperform previous data structures in a number of applications. Keywords { near neighbor, metric space, approximate queries, data mining, Dirichlet domains, Voronoi regions

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

Fast approximate nearest neighbors with automatic algorithm configuration

TL;DR: A system that answers the question, “What is the fastest approximate nearest-neighbor algorithm for my data?” and a new algorithm that applies priority search on hierarchical k-means trees, which is found to provide the best known performance on many datasets.
Journal ArticleDOI

Accelerating t-SNE using tree-based algorithms

TL;DR: Variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N) are developed and shown to substantially accelerate and make it possible to learnembeddings of data sets with millions of objects.
Proceedings Article

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

TL;DR: The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.
Journal ArticleDOI

Scalable Nearest Neighbor Algorithms for High Dimensional Data

TL;DR: It is shown that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and an automated configuration procedure for finding the best algorithm to search a particular data set is described.
Journal ArticleDOI

Searching in metric spaces

TL;DR: A unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework, and presents a quantitative definition of the elusive concept of "intrinsic dimensionality".
References
More filters
Journal ArticleDOI

Comparing images using the Hausdorff distance

TL;DR: Efficient algorithms for computing the Hausdorff distance between all possible relative positions of a binary image and a model are presented and it is shown that the method extends naturally to the problem of comparing a portion of a model against an image.
Proceedings ArticleDOI

Data structures and algorithms for nearest neighbor search in general metric spaces

TL;DR: The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search problems in general metric spaces.
Journal ArticleDOI

A Branch and Bound Algorithm for Computing k-Nearest Neighbors

TL;DR: The method of branch and bound is implemented in the present algorithm to facilitate rapid calculation of the k-nearest neighbors, by eliminating the necesssity of calculating many distances.
Journal ArticleDOI

Satisfying general proximity / similarity queries with metric trees

TL;DR: Divide-and-conquer search strategies are described for satisfying proximity queries involving arbitrary distance metrics involving arbitrarydistance metrics.
Journal ArticleDOI

Approximate string-matching with q -grams and maximal matches

TL;DR: Two string distance functions that are computable in linear time give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edited distance based string matching.
Related Papers (5)