Data structures and algorithms for nearest neighbor search in general metric spaces

doi:10.5555/313559.313789

Home
/
Papers
/
Data structures and algorithms for nearest neighbor search in general metric spaces

Proceedings Article•DOI•

Data structures and algorithms for nearest neighbor search in general metric spaces

Peter N. Yianilos¹•Institutions (1)

Princeton University¹

01 Jan 1993-pp 311-321

TL;DR: The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search problems in general metric spaces.

read less

Abstract: We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation 1s very high. Also relevant are high-dimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search nroblems. Tree construcI tion executes in O(nlog(n i ) time, and search is under certain circumstances and in the imit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kd-tree performance is compared.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Content-based image retrieval at the end of the early years

[...]

Arnold W. M. Smeulders¹, Marcel Worring¹, Simone Santini², Amarnath Gupta², Ramesh Jain - Show less +1 more•Institutions (2)

University of Amsterdam¹, University of California, San Diego²

01 Dec 2000-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.

...read moreread less

Abstract: Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

...read moreread less

6,447 citations

Monograph•DOI•

Planning Algorithms: Introductory Material

[...]

Steven M. LaValle

01 Jan 2006

TL;DR: This coherent and comprehensive book unifies material from several sources, including robotics, control theory, artificial intelligence, and algorithms, into planning under differential constraints that arise when automating the motions of virtually any mechanical system.

...read moreread less

Abstract: Planning algorithms are impacting technical disciplines and industries around the world, including robotics, computer-aided design, manufacturing, computer graphics, aerospace applications, drug design, and protein folding. This coherent and comprehensive book unifies material from several sources, including robotics, control theory, artificial intelligence, and algorithms. The treatment is centered on robot motion planning but integrates material on planning in discrete spaces. A major part of the book is devoted to planning under uncertainty, including decision theory, Markov decision processes, and information spaces, which are the “configuration spaces” of all sensor-based planning problems. The last part of the book delves into planning under differential constraints that arise when automating the motions of virtually any mechanical system. Developed from courses taught by the author, the book is intended for students, engineers, and researchers in robotics, artificial intelligence, and control theory as well as computer graphics, algorithms, and computational biology.

...read moreread less

6,340 citations

Cites background from "Data structures and algorithms for ..."

...For more information on efficient nearest-neighbor searching, see the recent survey [478], and [47, 48, 49, 53, 101, 232, 367, 479, 541, 761, 909, 989]....
[...]

Proceedings Article•DOI•

Approximate nearest neighbors: towards removing the curse of dimensionality

[...]

Piotr Indyk¹, Rajeev Motwani¹•Institutions (1)

Stanford University¹

23 May 1998

TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.

...read moreread less

Abstract: We present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. For data sets of size n living in R d , the algorithms require space that is only polynomial in n and d, while achieving query times that are sub-linear in n and polynomial in d. We also show applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree. The article is based on the material from the authors' STOC'98 and FOCS'01 papers. It unifies, generalizes and simplifies the results from those papers.

...read moreread less

4,478 citations

Journal Article•DOI•

Accelerating t-SNE using tree-based algorithms

[...]

Laurens van der Maaten¹•Institutions (1)

Delft University of Technology¹

01 Jan 2014-Journal of Machine Learning Research

TL;DR: Variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N) are developed and shown to substantially accelerate and make it possible to learnembeddings of data sets with millions of objects.

...read moreread less

Abstract: The paper investigates the acceleration of t-SNE--an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots--using two tree-based algorithms. In particular, the paper develops variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N). Our experiments show that the resulting algorithms substantially accelerate t-SNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the Barnes-Hut variant of t-SNE appears to outperform the dual-tree variant.

...read moreread less

2,079 citations

Cites background or methods from "Data structures and algorithms for ..."

...approximation of the similarities between the input objects using vantage-point trees (Yianilos, 1993), and subsequently, they approximate the forces between the points in the embedding with the help of either a Barnes-Hut algorithm (Barnes and Hut, 1986) or a dual-tree algorithm (Gray and Moore, 2001, 2003)....
[...]
...In a vantage-point tree, each node stores an input object and the radius of a (hyper)ball that is centered on this object (Yianilos, 1993)....
[...]
..., 2006), vantage-point trees (Yianilos, 1993), and trees constructed using hierarchical clustering (Fukunaga and Narendra, 1975; Brin, 1995; Nister and Stewenius, 2006)....
[...]

Proceedings Article•

From Word Embeddings To Document Distances

[...]

Matt J. Kusner¹, Yu Sun¹, Nicholas I. Kolkin¹, Kilian Q. Weinberger¹•Institutions (1)

Washington University in St. Louis¹

06 Jul 2015

TL;DR: It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates.

...read moreread less

Abstract: We present the Word Mover's Distance (WMD), a novel distance function between text documents. Our work is based on recent results in word embeddings that learn semantically meaningful representations for words from local cooccurrences in sentences. The WMD distance measures the dissimilarity between two text documents as the minimum amount of distance that the embedded words of one document need to "travel" to reach the embedded words of another document. We show that this distance metric can be cast as an instance of the Earth Mover's Distance, a well studied transportation problem for which several highly efficient solvers have been developed. Our metric has no hyperparameters and is straight-forward to implement. Further, we demonstrate on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the WMD metric leads to unprecedented low k-nearest neighbor document classification error rates.

...read moreread less

1,786 citations

Cites background from "Data structures and algorithms for ..."

...The nearest neighbor search has a time complexity of O(p2), and it can be sped up further by leveraging out-of-the-box tools for fast (approximate or exact) nearest neighbor retrieval (Garcia et al., 2008; Yianilos, 1993; Andoni & Indyk, 2006)....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Introduction to Statistical Pattern Recognition

[...]

Keinosuke Fukunaga

01 Jan 1972

TL;DR: This completely revised second edition presents an introduction to statistical pattern recognition, which is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field.

...read moreread less

Abstract: This completely revised second edition presents an introduction to statistical pattern recognition Pattern recognition in general covers a wide range of problems: it is applied to engineering problems, such as character readers and wave form analysis as well as to brain modeling in biology and psychology Statistical decision and estimation, which are the main subjects of this book, are regarded as fundamental to the study of pattern recognition This book is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field Each chapter contains computer projects as well as exercises

...read moreread less

10,526 citations

Journal Article•DOI•

Voronoi diagrams—a survey of a fundamental geometric data structure

[...]

Franz Aurenhammer¹•Institutions (1)

University of Graz¹

01 Sep 1991-ACM Computing Surveys

TL;DR: The Voronoi diagram as discussed by the authors divides the plane according to the nearest-neighbor points in the plane, and then divides the vertices of the plane into vertices, where vertices correspond to vertices in a plane.

...read moreread less

Abstract: Computational geometry is concerned with the design and analysis of algorithms for geometrical problems. In addition, other more practically oriented, areas of computer science— such as computer graphics, computer-aided design, robotics, pattern recognition, and operations research—give rise to problems that inherently are geometrical. This is one reason computational geometry has attracted enormous research interest in the past decade and is a well-established area today. (For standard sources, we refer to the survey article by Lee and Preparata [19841 and to the textbooks by Preparata and Shames [1985] and Edelsbrunner [1987bl.) Readers familiar with the literature of computational geometry will have noticed, especially in the last few years, an increasing interest in a geometrical construct called the Voronoi diagram. This trend can also be observed in combinatorial geometry and in a considerable number of articles in natural science journals that address the Voronoi diagram under different names specific to the respective area. Given some number of points in the plane, their Voronoi diagram divides the plane according to the nearest-neighbor

...read moreread less

4,236 citations

"Data structures and algorithms for ..." refers methods in this paper

...More recently, the Voronoi digram [21] has provided a useful tool in low- dimensional Euclidian settings { and Figure 1: vp-tree decomposition Figure 2: kd-tree decomposition the overall eld and outlook of Computational Geometry has yielded many interesting results such as those of [22, 23, 24, 25] and earlier [26]....
[...]

Journal Article•DOI•

An Algorithm for Finding Best Matches in Logarithmic Expected Time

[...]

Jerome H. Friedman¹, Jon Louis Bentley², Raphael A. Finkel¹•Institutions (2)

Stanford University¹, University of North Carolina at Chapel Hill²

01 Sep 1977-ACM Transactions on Mathematical Software

TL;DR: An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record.

...read moreread less

Abstract: An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of records examined in each search is independent of the file size. The expected computation to perform each search is proportional to logN. Empirical evidence suggests that except for very small files, this algorithm is considerably faster than other methods.

...read moreread less

2,910 citations

Book•

Nearest neighbor (NN) norms: NN pattern classification techniques

[...]

Belur V. Dasarathy

01 Jan 1991

1,828 citations

Journal Article•DOI•

A Branch and Bound Algorithm for Computing k-Nearest Neighbors

[...]

Keinosuke Fukunaga¹, Patrenahalli M. Narendra¹•Institutions (1)

Purdue University¹

01 Jul 1975-IEEE Transactions on Computers

TL;DR: The method of branch and bound is implemented in the present algorithm to facilitate rapid calculation of the k-nearest neighbors, by eliminating the necesssity of calculating many distances.

...read moreread less

Abstract: Computation of the k-nearest neighbors generally requires a large number of expensive distance computations. The method of branch and bound is implemented in the present algorithm to facilitate rapid calculation of the k-nearest neighbors, by eliminating the necesssity of calculating many distances. Experimental results demonstrate the efficiency of the algorithm. Typically, an average of only 61 distance computations were made to find the nearest neighbor of a test sample among 1000 design samples.

...read moreread less

776 citations

"Data structures and algorithms for ..." refers background or methods in this paper

...Fukunaga in [8, 9] exploits the triangle inequality to reduce distance computations searching a hierarchical decomposition of Euclidian Space....
[...]
...This contrasts with the coordinate aligned hyperplanar cuts of the kdtree (See Figures 1 & 2), and the use of computed Euclidian cluster centroids in [8]....
[...]
...Comparisons with the experiments of [8] are more difcult because leaf level buckets are employed....
[...]