scispace - formally typeset
Open AccessProceedings Article

What Is the Nearest Neighbor in High Dimensional Spaces

TLDR
A new generalized notion of nearest neighbor search is identified as the relevant problem in high dimensional space and a quality criterion is used to select relevant dimensions (projections) with respect to the given query.
Abstract
Nearest neighbor search in high dimensional spaces is an interesting and important problem which is relevant for a wide variety of novel database applications. As recent results show, however, the problem is a very di cult one, not only with regards to the performance issue but also to the quality issue. In this paper, we discuss the quality issue and identify a new generalized notion of nearest neighbor search as the relevant problem in high dimensional space. In contrast to previous approaches, our new notion of nearest neighbor search does not treat all dimensions equally but uses a quality criterion to select relevant dimensions (projections) with respect to the given query. As an example for a useful quality criterion, we rate how well the data is clustered around the query point within the selected projection. We then propose an e cient and e ective algorithm to solve the generalized nearest neighbor problem. Our experiments based on a number of real and synthetic data sets show that our new approach provides new insights into the nature of nearest neighbor search on high

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Locality-sensitive hashing scheme based on p-stable distributions

TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.
Book ChapterDOI

On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

TL;DR: This paper examines the behavior of the commonly used L k norm and shows that the problem of meaningfulness in high dimensionality is sensitive to the value of k, which means that the Manhattan distance metric is consistently more preferable than the Euclidean distance metric for high dimensional data mining applications.
Journal ArticleDOI

A review of content-based image retrieval systems in medical applications—clinical benefits and future directions

TL;DR: The goal is not, in general, to replace text-based retrieval methods as they exist at the moment but to complement them with visual search tools.
Journal ArticleDOI

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

TL;DR: This survey tries to clarify the different problem definitions related to subspace clustering in general; the specific difficulties encountered in this field of research; the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems.
Proceedings ArticleDOI

Outlier detection for high dimensional data

TL;DR: New techniques for outlier detection which find the outliers by studying the behavior of projections from the data set are discussed.
References
More filters
Journal ArticleDOI

Basic Local Alignment Search Tool

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Proceedings ArticleDOI

R-trees: a dynamic index structure for spatial searching

TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.
Journal ArticleDOI

Multidimensional access methods

TL;DR: The class of point access methods, which are used to search sets of points in two or more dimensions, are presented and a discussion of theoretical and experimental results concerning the relative performance of various approaches are discussed.
Proceedings Article

A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

TL;DR: It is shown formally that partitioning and clustering techniques for similarity search in HDVSs exhibit linear complexity at high dimensionality, and that existing methods are outperformed on average by a simple sequential scan if the number of dimensions exceeds around 10.
Related Papers (5)