Data structures and algorithms for nearest neighbor search in general metric spaces
Peter N. Yianilos
- pp 311-321
Reads0
Chats0
TLDR
The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search problems in general metric spaces.Abstract:
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation 1s very high. Also relevant are high-dimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search nroblems. Tree construcI tion executes in O(nlog(n i ) time, and search is under certain circumstances and in the imit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kd-tree performance is compared.read more
Citations
More filters
Patent
Method of indexed storage and retrieval of multidimensional information
TL;DR: In this paper, a tree-structured index to multidimensional data is created using naturally occurring patterns and clusters within the data which permit efficient search and retrieval strategies in a database of DNA profiles.
Book ChapterDOI
A content-addressable network for similarity search in metric spaces
TL;DR: In this article, the authors present a scalable and distributed access structure for similarity search in metric spaces based on the Content-addressable Network (CAN) paradigm, which provides a Distributed Hash Table (DHT) abstraction over a Cartesian space.
Proceedings ArticleDOI
Boosting nearest neighbor classifiers for multiclass recognition
Vassilis Athitsos,Stan Sclaroff +1 more
TL;DR: An algorithm that uses boosting to learn a distance measure for multiclass k-nearest neighbor classification and achieves lower error rates in some of the datasets, which indicates that it is a method worth considering for nearest neighbor recognition in various pattern recognition domains.
Journal ArticleDOI
Graph based k-means clustering
TL;DR: An original approach to cluster multi-component data sets is proposed that includes an estimation of the number of clusters that is used to initialize the generalized Lloyd algorithm, also known as k-means, which circumvents its well known initialization problems.
Dissertation
Efficient algorithms for new computational models
TL;DR: Algorithms and complexity results for these computational models follow the central thesis that it is an important part of theoretical computer science to model real-world computational structures, and that such effort is richly rewarded by a plethora of interesting and challenging problems.
References
More filters
Book
Introduction to Statistical Pattern Recognition
TL;DR: This completely revised second edition presents an introduction to statistical pattern recognition, which is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field.
Journal ArticleDOI
Voronoi diagrams—a survey of a fundamental geometric data structure
TL;DR: The Voronoi diagram as discussed by the authors divides the plane according to the nearest-neighbor points in the plane, and then divides the vertices of the plane into vertices, where vertices correspond to vertices in a plane.
Journal ArticleDOI
An Algorithm for Finding Best Matches in Logarithmic Expected Time
TL;DR: An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record.
Journal ArticleDOI
A Branch and Bound Algorithm for Computing k-Nearest Neighbors
TL;DR: The method of branch and bound is implemented in the present algorithm to facilitate rapid calculation of the k-nearest neighbors, by eliminating the necesssity of calculating many distances.