When is nearest neighbor meaningful

Open AccessJournal Article

When is nearest neighbor meaningful

Kevin S. Beyer, +3 more

- 01 Jan 1999 -

Lecture Notes in Computer Science

- pp 217-235

Chats0

TLDR

In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.

Abstract:

We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.

Citations

PDF

Open Access

More filters

Book

Data Mining: Concepts and Techniques

Jiawei Han, +2 more

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Journal ArticleDOI

Survey of clustering algorithms

Rui Xu, +1 more

- 01 May 2005 -

IEEE Transactions on Neural Networks

TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.

...read moreread less

Proceedings ArticleDOI

Locality-sensitive hashing scheme based on p-stable distributions

Mayur Datar, +3 more

TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.

...read moreread less

Book ChapterDOI

A Survey of Clustering Data Mining Techniques

Pavel Berkhin

TL;DR: This survey concentrates on clustering algorithms from a data mining perspective as a data modeling technique that provides for concise summaries of the data.

...read moreread less

Journal ArticleDOI

Product Quantization for Nearest Neighbor Search

Hervé Jégou, +2 more

- 01 Jan 2011 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This paper introduces a product quantization-based approach for approximate nearest neighbor search to decompose the space into a Cartesian product of low-dimensional subspaces and to quantize each subspace separately.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Basic Local Alignment Search Tool

Stephen F. Altschul, +4 more

- 01 Oct 1990 -

Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

Book

Data Mining: Concepts and Techniques

Jiawei Han, +2 more

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Journal ArticleDOI

Survey of clustering algorithms

Rui Xu, +1 more

- 01 May 2005 -

IEEE Transactions on Neural Networks

TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.

...read moreread less

Journal ArticleDOI

Texture features for browsing and retrieval of image data

B.S. Manjunath, +1 more

- 01 Aug 1996 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Comparisons with other multiresolution texture features using the Brodatz texture database indicate that the Gabor features provide the best pattern retrieval accuracy.

...read moreread less

Proceedings ArticleDOI

Locality-sensitive hashing scheme based on p-stable distributions

Mayur Datar, +3 more

TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.

...read moreread less