Open AccessProceedings Article
Near Neighbor Search in Large Metric Spaces
Sergey Brin
- pp 574-584
Reads0
Chats0
TLDR
A data structure to solve the problem of finding approximate matches in a large database called a GNAT { Geometric Near-neighbor Access Tree} is introduced based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of theData that does not use its intrinsic geometry.Abstract:
Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically diffcult case where each data element is high dimensional, or more generally, is represented by a point in a large metric spaceand distance calculations are computationally expensive. In this paper we introduce a data structure to solve this problem called a GNAT { Geometric Near-neighbor Access Tree. It is based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of the data that does not use its intrinsic geometry. In experiments, we find that GNAT's outperform previous data structures in a number of applications. Keywords { near neighbor, metric space, approximate queries, data mining, Dirichlet domains, Voronoi regionsread more
Citations
More filters
Proceedings Article
Fast approximate nearest neighbors with automatic algorithm configuration
Marius Muja,David G. Lowe +1 more
TL;DR: A system that answers the question, “What is the fastest approximate nearest-neighbor algorithm for my data?” and a new algorithm that applies priority search on hierarchical k-means trees, which is found to provide the best known performance on many datasets.
Journal ArticleDOI
Accelerating t-SNE using tree-based algorithms
TL;DR: Variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N) are developed and shown to substantially accelerate and make it possible to learnembeddings of data sets with millions of objects.
Proceedings Article
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
TL;DR: The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.
Journal ArticleDOI
Scalable Nearest Neighbor Algorithms for High Dimensional Data
Marius Muja,David G. Lowe +1 more
TL;DR: It is shown that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and an automated configuration procedure for finding the best algorithm to search a particular data set is described.
Journal ArticleDOI
Searching in metric spaces
TL;DR: A unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework, and presents a quantitative definition of the elusive concept of "intrinsic dimensionality".
References
More filters
Journal ArticleDOI
Comparing images using the Hausdorff distance
TL;DR: Efficient algorithms for computing the Hausdorff distance between all possible relative positions of a binary image and a model are presented and it is shown that the method extends naturally to the problem of comparing a portion of a model against an image.
Proceedings ArticleDOI
Data structures and algorithms for nearest neighbor search in general metric spaces
TL;DR: The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search problems in general metric spaces.
Journal ArticleDOI
A Branch and Bound Algorithm for Computing k-Nearest Neighbors
TL;DR: The method of branch and bound is implemented in the present algorithm to facilitate rapid calculation of the k-nearest neighbors, by eliminating the necesssity of calculating many distances.
Journal ArticleDOI
Satisfying general proximity / similarity queries with metric trees
TL;DR: Divide-and-conquer search strategies are described for satisfying proximity queries involving arbitrary distance metrics involving arbitrarydistance metrics.
Journal ArticleDOI
Approximate string-matching with q -grams and maximal matches
TL;DR: Two string distance functions that are computable in linear time give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edited distance based string matching.