scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper compares several families of space hashing functions in a real setup and reveals that unstructured quantizer significantly improves the accuracy of LSH, as it closely fits the data in the feature space.

327 citations

Journal ArticleDOI
TL;DR: This lecture note describes a technique known as locality-sensitive hashing (LSH) that allows one to quickly find similar entries in large databases using a novel and interesting class of algorithms that are known as randomized algorithms.
Abstract: This lecture note describes a technique known as locality-sensitive hashing (LSH) that allows one to quickly find similar entries in large databases. This approach belongs to a novel and interesting class of algorithms that are known as randomized algorithms. A randomized algorithm does not guarantee an exact answer but instead provides a high probability guarantee that it will return the correct answer or one close to it. By investing additional computational effort, the probability can be pushed as high as desired.

326 citations

01 Jan 2001
TL;DR: This paper proposes a scheme that exploits a structured search algorithm that allows searching databases containing over 100,000 songs, and shows that the proposed scheme is robust against severe compression, but bit errors do occur.
Abstract: Nowadays most audio content identification systems are based on watermarking technology. In this paper we present a different technology, referred to as robust audio hashing. By extracting robust features and translating them into a bit string, we get an object called a robust hash. Content can then be identified by comparing hash values of a received audio clip with the hash values of previously stored original audio clips. A distinguishing feature of the proposed hash scheme is its ability to extract a bit string for every so many milliseconds. More precisely, for every windowed time interval a hash value of 32 bits is computed by thresholding energy differences of several frequency bands. A sequence of 256 hash values, corresponding to approximately 3 seconds of audio, can uniquely identify a song. Experimental results show that the proposed scheme is robust against severe compression, but bit errors do occur. This implies that searching and matching is a non-trivial task for large databases. For instance, a brute force search approach is already prohibitive for databases containing hash values of more than 100 songs. Therefore we propose a scheme that exploits a structured search algorithm that allows searching databases containing over 100,000 songs.

317 citations

Posted Content
TL;DR: This work presents the first provably sublinear time algorithm for approximateMaximum Inner Product Search (MIPS), and is also the first hashing algorithm for searching with (un-normalized) inner product as the underlying similarity measure.
Abstract: We present the first provably sublinear time algorithm for approximate \emph{Maximum Inner Product Search} (MIPS). Our proposal is also the first hashing algorithm for searching with (un-normalized) inner product as the underlying similarity measure. Finding hashing schemes for MIPS was considered hard. We formally show that the existing Locality Sensitive Hashing (LSH) framework is insufficient for solving MIPS, and then we extend the existing LSH framework to allow asymmetric hashing schemes. Our proposal is based on an interesting mathematical phenomenon in which inner products, after independent asymmetric transformations, can be converted into the problem of approximate near neighbor search. This key observation makes efficient sublinear hashing scheme for MIPS possible. In the extended asymmetric LSH (ALSH) framework, we provide an explicit construction of provably fast hashing scheme for MIPS. The proposed construction and the extended LSH framework could be of independent theoretical interest. Our proposed algorithm is simple and easy to implement. We evaluate the method, for retrieving inner products, in the collaborative filtering task of item recommendations on Netflix and Movielens datasets.

299 citations

Journal ArticleDOI
TL;DR: A method that enables scalable similarity search for learned metrics and an indirect solution that enables metric learning and hashing for vector spaces whose high dimensionality makes it infeasible to learn an explicit transformation over the feature dimensions.
Abstract: We introduce a method that enables scalable similarity search for learned metrics. Given pairwise similarity and dissimilarity constraints between some examples, we learn a Mahalanobis distance function that captures the examples' underlying relationships well. To allow sublinear time similarity search under the learned metric, we show how to encode the learned metric parameterization into randomized locality-sensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for vector spaces whose high dimensionality makes it infeasible to learn an explicit transformation over the feature dimensions. We demonstrate the approach applied to a variety of image data sets, as well as a systems data set. The learned metrics improve accuracy relative to commonly used metric baselines, while our hashing construction enables efficient indexing with learned distances and very large databases.

281 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139