scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings ArticleDOI
03 Dec 2014
TL;DR: Experiments on various real-life datasets show that performance of this new variant of randomized partition trees is superior to the previous variant as well as to the locality sensitive hashing (LSH) method for nearest neighbor search.
Abstract: Recently, randomized partition trees have been theoretically shown to be very effective in performing high dimensional nearest neighbor search. In this paper, we introduce a variant of randomized partition trees for high dimensional nearest neighbor search problem and provide theoretical justification for its choice. Experiments on various real-life datasets show that performance of this new variant is superior to the previous variant as well as to the locality sensitive hashing (LSH) method for nearest neighbor search. In addition, we establish the connection between various notions of difficulty in nearest neighbor search problem, that have recently been introduced, namely, potential function and relative contrast.

12 citations

Proceedings ArticleDOI
19 Oct 2009
TL;DR: Experimental results reveal various advantages in effectiveness, efficiency, adaptiveness, and scalability of the proposed music similarity measure and the music search system.
Abstract: How to measure and model the similarity between different music items is one of the most fundamental yet challenging research problems in music information retrieval. This paper demonstrates a novel multimodal and adaptive music similarity measure (CompositeMap) with its application in a personalized multimodal music search system. CompositeMap can effectively combine music properties from different aspects into compact signatures via supervised learning, which lays the foundation for effective and efficient music search. In addition, an incremental Locality Sensitive Hashing algorithm is developed to support more efficient search processes. Experimental results based on two large music collections reveal various advantages in effectiveness, efficiency, adaptiveness, and scalability of the proposed music similarity measure and the music search system.

12 citations

Posted Content
TL;DR: The Johnson-Lindenstrauss Lemma is generalized to define "low-quality" mappings to a Euclidean space of significantly lower dimension, such that they satisfy a requirement weaker than approximately preserving all distances or even preserving the nearest neighbor.
Abstract: The approximate nearest neighbor problem ($\epsilon$-ANN) in Euclidean settings is a fundamental question, which has been addressed by two main approaches: Data-dependent space partitioning techniques perform well when the dimension is relatively low, but are affected by the curse of dimensionality. On the other hand, locality sensitive hashing has polynomial dependence in the dimension, sublinear query time with an exponent inversely proportional to the error factor $\epsilon$, and subquadratic space requirement. We generalize the Johnson-Lindenstrauss lemma to define "low-quality" mappings to a Euclidean space of significantly lower dimension, such that they satisfy a requirement weaker than approximately preserving all distances or even preserving the nearest neighbor. This mapping guarantees, with arbitrarily high probability, that an approximate nearest neighbor lies among the $k$ approximate nearest neighbors in the projected space. This leads to a randomized tree based data structure that avoids the curse of dimensionality for $\epsilon$-ANN. Our algorithm, given $n$ points in dimension $d$, achieves space usage in $O(dn)$, preprocessing time in $O(dn\log n)$, and query time in $O(d n^{\rho}\log n)$, where $\rho$ is proportional to $1-{1}/{\ln \ln n}$, for fixed $\epsilon \in (0,1)$. It employs a data structure, such as BBD-trees, that efficiently finds $k$ approximate nearest neighbors. The dimension reduction is larger if one assumes that pointsets possess some structure, namely bounded expansion rate. We implement our method and present experimental results in up to 500 dimensions and $10^5$ points, which show that the practical performance is better than predicted by the theoretical analysis. In addition, we compare our approach with E2LSH.

12 citations

Journal ArticleDOI
TL;DR: In this article, a data structure for answering approximate nearest neighbor queries in high-dimensional Euclidean space has been proposed based on the technique of Indyk (SODA 2003), storing random projections to provide sublinear query time.

12 citations

Journal ArticleDOI
TL;DR: An adaptive discrete cyclic coordinate descent (ACC) method to effectively solve discrete optimization problem and achieves speed-up over compared the state-of-the-art methods, while having on-par and in some cases even better performance.

12 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139