scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Data structures and algorithms for nearest neighbor search in general metric spaces

01 Jan 1993-pp 311-321
TL;DR: The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search problems in general metric spaces.
Abstract: We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation 1s very high. Also relevant are high-dimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search nroblems. Tree construcI tion executes in O(nlog(n i ) time, and search is under certain circumstances and in the imit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kd-tree performance is compared.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.
Abstract: Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

6,447 citations

MonographDOI
01 Jan 2006
TL;DR: This coherent and comprehensive book unifies material from several sources, including robotics, control theory, artificial intelligence, and algorithms, into planning under differential constraints that arise when automating the motions of virtually any mechanical system.
Abstract: Planning algorithms are impacting technical disciplines and industries around the world, including robotics, computer-aided design, manufacturing, computer graphics, aerospace applications, drug design, and protein folding. This coherent and comprehensive book unifies material from several sources, including robotics, control theory, artificial intelligence, and algorithms. The treatment is centered on robot motion planning but integrates material on planning in discrete spaces. A major part of the book is devoted to planning under uncertainty, including decision theory, Markov decision processes, and information spaces, which are the “configuration spaces” of all sensor-based planning problems. The last part of the book delves into planning under differential constraints that arise when automating the motions of virtually any mechanical system. Developed from courses taught by the author, the book is intended for students, engineers, and researchers in robotics, artificial intelligence, and control theory as well as computer graphics, algorithms, and computational biology.

6,340 citations


Cites background from "Data structures and algorithms for ..."

  • ...For more information on efficient nearest-neighbor searching, see the recent survey [478], and [47, 48, 49, 53, 101, 232, 367, 479, 541, 761, 909, 989]....

    [...]

Proceedings ArticleDOI
23 May 1998
TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.
Abstract: We present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. For data sets of size n living in R d , the algorithms require space that is only polynomial in n and d, while achieving query times that are sub-linear in n and polynomial in d. We also show applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree. The article is based on the material from the authors' STOC'98 and FOCS'01 papers. It unifies, generalizes and simplifies the results from those papers.

4,478 citations

Journal ArticleDOI
TL;DR: Variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N) are developed and shown to substantially accelerate and make it possible to learnembeddings of data sets with millions of objects.
Abstract: The paper investigates the acceleration of t-SNE--an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots--using two tree-based algorithms. In particular, the paper develops variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N). Our experiments show that the resulting algorithms substantially accelerate t-SNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the Barnes-Hut variant of t-SNE appears to outperform the dual-tree variant.

2,079 citations


Cites background or methods from "Data structures and algorithms for ..."

  • ...approximation of the similarities between the input objects using vantage-point trees (Yianilos, 1993), and subsequently, they approximate the forces between the points in the embedding with the help of either a Barnes-Hut algorithm (Barnes and Hut, 1986) or a dual-tree algorithm (Gray and Moore, 2001, 2003)....

    [...]

  • ...In a vantage-point tree, each node stores an input object and the radius of a (hyper)ball that is centered on this object (Yianilos, 1993)....

    [...]

  • ..., 2006), vantage-point trees (Yianilos, 1993), and trees constructed using hierarchical clustering (Fukunaga and Narendra, 1975; Brin, 1995; Nister and Stewenius, 2006)....

    [...]

Proceedings Article
06 Jul 2015
TL;DR: It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates.
Abstract: We present the Word Mover's Distance (WMD), a novel distance function between text documents. Our work is based on recent results in word embeddings that learn semantically meaningful representations for words from local cooccurrences in sentences. The WMD distance measures the dissimilarity between two text documents as the minimum amount of distance that the embedded words of one document need to "travel" to reach the embedded words of another document. We show that this distance metric can be cast as an instance of the Earth Mover's Distance, a well studied transportation problem for which several highly efficient solvers have been developed. Our metric has no hyperparameters and is straight-forward to implement. Further, we demonstrate on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the WMD metric leads to unprecedented low k-nearest neighbor document classification error rates.

1,786 citations


Cites background from "Data structures and algorithms for ..."

  • ...The nearest neighbor search has a time complexity of O(p2), and it can be sped up further by leveraging out-of-the-box tools for fast (approximate or exact) nearest neighbor retrieval (Garcia et al., 2008; Yianilos, 1993; Andoni & Indyk, 2006)....

    [...]

References
More filters
Book
01 Jan 1972
TL;DR: This completely revised second edition presents an introduction to statistical pattern recognition, which is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field.
Abstract: This completely revised second edition presents an introduction to statistical pattern recognition Pattern recognition in general covers a wide range of problems: it is applied to engineering problems, such as character readers and wave form analysis as well as to brain modeling in biology and psychology Statistical decision and estimation, which are the main subjects of this book, are regarded as fundamental to the study of pattern recognition This book is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field Each chapter contains computer projects as well as exercises

10,526 citations

Journal ArticleDOI
TL;DR: The Voronoi diagram as discussed by the authors divides the plane according to the nearest-neighbor points in the plane, and then divides the vertices of the plane into vertices, where vertices correspond to vertices in a plane.
Abstract: Computational geometry is concerned with the design and analysis of algorithms for geometrical problems. In addition, other more practically oriented, areas of computer science— such as computer graphics, computer-aided design, robotics, pattern recognition, and operations research—give rise to problems that inherently are geometrical. This is one reason computational geometry has attracted enormous research interest in the past decade and is a well-established area today. (For standard sources, we refer to the survey article by Lee and Preparata [19841 and to the textbooks by Preparata and Shames [1985] and Edelsbrunner [1987bl.) Readers familiar with the literature of computational geometry will have noticed, especially in the last few years, an increasing interest in a geometrical construct called the Voronoi diagram. This trend can also be observed in combinatorial geometry and in a considerable number of articles in natural science journals that address the Voronoi diagram under different names specific to the respective area. Given some number of points in the plane, their Voronoi diagram divides the plane according to the nearest-neighbor

4,236 citations


"Data structures and algorithms for ..." refers methods in this paper

  • ...More recently, the Voronoi digram [21] has provided a useful tool in low- dimensional Euclidian settings { and Figure 1: vp-tree decomposition Figure 2: kd-tree decomposition the overall eld and outlook of Computational Geometry has yielded many interesting results such as those of [22, 23, 24, 25] and earlier [26]....

    [...]

Journal ArticleDOI
TL;DR: An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record.
Abstract: An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of records examined in each search is independent of the file size. The expected computation to perform each search is proportional to logN. Empirical evidence suggests that except for very small files, this algorithm is considerably faster than other methods.

2,910 citations

Journal ArticleDOI
TL;DR: The method of branch and bound is implemented in the present algorithm to facilitate rapid calculation of the k-nearest neighbors, by eliminating the necesssity of calculating many distances.
Abstract: Computation of the k-nearest neighbors generally requires a large number of expensive distance computations. The method of branch and bound is implemented in the present algorithm to facilitate rapid calculation of the k-nearest neighbors, by eliminating the necesssity of calculating many distances. Experimental results demonstrate the efficiency of the algorithm. Typically, an average of only 61 distance computations were made to find the nearest neighbor of a test sample among 1000 design samples.

776 citations


"Data structures and algorithms for ..." refers background or methods in this paper

  • ...Fukunaga in [8, 9] exploits the triangle inequality to reduce distance computations searching a hierarchical decomposition of Euclidian Space....

    [...]

  • ...This contrasts with the coordinate aligned hyperplanar cuts of the kdtree (See Figures 1 & 2), and the use of computed Euclidian cluster centroids in [8]....

    [...]

  • ...Comparisons with the experiments of [8] are more difcult because leaf level buckets are employed....

    [...]