What Is the Nearest Neighbor in High Dimensional Spaces

Open AccessProceedings Article

What Is the Nearest Neighbor in High Dimensional Spaces

- pp 506-515

TLDR

A new generalized notion of nearest neighbor search is identified as the relevant problem in high dimensional space and a quality criterion is used to select relevant dimensions (projections) with respect to the given query.

Abstract:

Nearest neighbor search in high dimensional spaces is an interesting and important problem which is relevant for a wide variety of novel database applications. As recent results show, however, the problem is a very di cult one, not only with regards to the performance issue but also to the quality issue. In this paper, we discuss the quality issue and identify a new generalized notion of nearest neighbor search as the relevant problem in high dimensional space. In contrast to previous approaches, our new notion of nearest neighbor search does not treat all dimensions equally but uses a quality criterion to select relevant dimensions (projections) with respect to the given query. As an example for a useful quality criterion, we rate how well the data is clustered around the query point within the selected projection. We then propose an e cient and e ective algorithm to solve the generalized nearest neighbor problem. Our experiments based on a number of real and synthetic data sets show that our new approach provides new insights into the nature of nearest neighbor search on high

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Locality-sensitive hashing scheme based on p-stable distributions

Mayur Datar, +3 more

TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.

...read moreread less

Book ChapterDOI

On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

Charu C. Aggarwal, +2 more

TL;DR: This paper examines the behavior of the commonly used L k norm and shows that the problem of meaningfulness in high dimensionality is sensitive to the value of k, which means that the Manhattan distance metric is consistently more preferable than the Euclidean distance metric for high dimensional data mining applications.

...read moreread less

Journal ArticleDOI

A review of content-based image retrieval systems in medical applications—clinical benefits and future directions

Henning Müller, +3 more

- 01 Feb 2004 -

International Journal of Medical Informa...

TL;DR: The goal is not, in general, to replace text-based retrieval methods as they exist at the moment but to complement them with visual search tools.

...read moreread less

Journal ArticleDOI

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

Hans-Peter Kriegel, +2 more

- 23 Mar 2009 -

ACM Transactions on Knowledge Discovery ...

TL;DR: This survey tries to clarify the different problem definitions related to subspace clustering in general; the specific difficulties encountered in this field of research; the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems.

...read moreread less

Proceedings ArticleDOI

Outlier detection for high dimensional data

Charu C. Aggarwal, +1 more

TL;DR: New techniques for outlier detection which find the outliers by studying the behavior of projections from the data set are discussed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Basic Local Alignment Search Tool

Stephen F. Altschul, +4 more

- 01 Oct 1990 -

Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

Proceedings ArticleDOI

R-trees: a dynamic index structure for spatial searching

Antonin Guttman

TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.

...read moreread less

Book

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Gerard Salton

Journal ArticleDOI

Multidimensional access methods

Volker Gaede, +1 more

- 01 Jun 1998 -

ACM Computing Surveys

TL;DR: The class of point access methods, which are used to search sets of points in two or more dimensions, are presented and a discussion of theoretical and experimental results concerning the relative performance of various approaches are discussed.

...read moreread less

Proceedings Article

A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

Roger Weber, +2 more

TL;DR: It is shown formally that partitioning and clustering techniques for similarity search in HDVSs exhibit linear complexity at high dimensionality, and that existing methods are outperformed on average by a simple sequential scan if the number of dimensions exceeds around 10.

...read moreread less

What Is the Nearest Neighbor in High Dimensional Spaces

Citations

Locality-sensitive hashing scheme based on p-stable distributions

On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

A review of content-based image retrieval systems in medical applications—clinical benefits and future directions

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

Outlier detection for high dimensional data

References

Basic Local Alignment Search Tool

R-trees: a dynamic index structure for spatial searching

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Multidimensional access methods

A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

Related Papers (5)

When Is ''Nearest Neighbor'' Meaningful?

Automatic subspace clustering of high dimensional data for data mining applications

A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Data Mining: Concepts and Techniques