Near Neighbor Search in Large Metric Spaces

Open AccessProceedings Article

Near Neighbor Search in Large Metric Spaces

Sergey Brin

- pp 574-584

Chats0

TLDR

A data structure to solve the problem of finding approximate matches in a large database called a GNAT { Geometric Near-neighbor Access Tree} is introduced based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of theData that does not use its intrinsic geometry.

Abstract:

Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically diffcult case where each data element is high dimensional, or more generally, is represented by a point in a large metric spaceand distance calculations are computationally expensive. In this paper we introduce a data structure to solve this problem called a GNAT { Geometric Near-neighbor Access Tree. It is based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of the data that does not use its intrinsic geometry. In experiments, we find that GNAT's outperform previous data structures in a number of applications. Keywords { near neighbor, metric space, approximate queries, data mining, Dirichlet domains, Voronoi regions

Citations

PDF

Open Access

More filters

Proceedings Article

Fast approximate nearest neighbors with automatic algorithm configuration

Marius Muja, +1 more

TL;DR: A system that answers the question, “What is the fastest approximate nearest-neighbor algorithm for my data?” and a new algorithm that applies priority search on hierarchical k-means trees, which is found to provide the best known performance on many datasets.

...read moreread less

Journal ArticleDOI

Accelerating t-SNE using tree-based algorithms

Laurens van der Maaten

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: Variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N) are developed and shown to substantially accelerate and make it possible to learnembeddings of data sets with millions of objects.

...read moreread less

Proceedings Article

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

Paolo Ciaccia, +2 more

TL;DR: The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.

...read moreread less

Journal ArticleDOI

Scalable Nearest Neighbor Algorithms for High Dimensional Data

Marius Muja, +1 more

- 01 May 2014 -

IEEE Transactions on Pattern Analysis an...

TL;DR: It is shown that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and an automated configuration procedure for finding the best algorithm to search a particular data set is described.

...read moreread less

Journal ArticleDOI

Searching in metric spaces

Edgar Chávez, +3 more

- 01 Sep 2001 -

ACM Computing Surveys

TL;DR: A unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework, and presents a quantitative definition of the elusive concept of "intrinsic dimensionality".

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Comparing images using the Hausdorff distance

Daniel P. Huttenlocher, +2 more

- 01 Sep 1993 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Efficient algorithms for computing the Hausdorff distance between all possible relative positions of a binary image and a model are presented and it is shown that the method extends naturally to the problem of comparing a portion of a model against an image.

...read moreread less

Proceedings ArticleDOI

Data structures and algorithms for nearest neighbor search in general metric spaces

Peter N. Yianilos

TL;DR: The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search problems in general metric spaces.

...read moreread less

Journal ArticleDOI

A Branch and Bound Algorithm for Computing k-Nearest Neighbors

Keinosuke Fukunaga, +1 more

- 01 Jul 1975 -

IEEE Transactions on Computers

TL;DR: The method of branch and bound is implemented in the present algorithm to facilitate rapid calculation of the k-nearest neighbors, by eliminating the necesssity of calculating many distances.

...read moreread less

Journal ArticleDOI

Satisfying general proximity / similarity queries with metric trees

Jeffrey Uhlmann

- 25 Nov 1991 -

Information Processing Letters

TL;DR: Divide-and-conquer search strategies are described for satisfying proximity queries involving arbitrary distance metrics involving arbitrarydistance metrics.

...read moreread less

Journal ArticleDOI

Approximate string-matching with q -grams and maximal matches

Esko Ukkonen

TL;DR: Two string distance functions that are computable in linear time give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edited distance based string matching.

...read moreread less

Near Neighbor Search in Large Metric Spaces

Citations

Fast approximate nearest neighbors with automatic algorithm configuration

Accelerating t-SNE using tree-based algorithms

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

Scalable Nearest Neighbor Algorithms for High Dimensional Data

Searching in metric spaces

References

Comparing images using the Hausdorff distance

Data structures and algorithms for nearest neighbor search in general metric spaces

A Branch and Bound Algorithm for Computing k-Nearest Neighbors

Satisfying general proximity / similarity queries with metric trees

Approximate string-matching with q -grams and maximal matches

Related Papers (5)

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

Data structures and algorithms for nearest neighbor search in general metric spaces

Searching in metric spaces

R-trees: a dynamic index structure for spatial searching

The R*-tree: an efficient and robust access method for points and rectangles