scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings ArticleDOI
17 Jan 2012
TL;DR: This paper generalizes the well-known LSH for the Jaccard set similarity, namely, the minwise-independent permutations, and obtains LSHs for many set similarity measures that are used in practice.
Abstract: Locality sensitive hashing (LSH) is a key algorithmic tool that is widely used both in theory and practice. An important goal in the study of LSH is to understand which similarity functions admit an LSH, i.e., are LSHable. In this paper we focus on the class of transformations such that given any similarity that is LSHable, the transformed similarity will continue to be LSHable. We show a tight characterization of all such LSH-preserving transformations: they are precisely the probability generating functions, up to scaling.As a concrete application of this result, we study which set similarity measures are LSHable. We obtain a complete characterization of similarity measures between two sets A and B that are ratios of two linear functions of |A ∩ B|, |A Δ B|, |A ∪ B|: such a measure is LSHable if and only if its corresponding distance is a metric. This result generalizes the well-known LSH for the Jaccard set similarity, namely, the minwise-independent permutations, and obtains LSHs for many set similarity measures that are used in practice. Using our main result, we obtain a similar characterization for set similarities involving radicals.

13 citations

Proceedings Article
01 Jan 2015
TL;DR: A Multi-label Least-Squares Hashing (MLSH) method for multi-label data hashing, which outperforms several state-of-the-art hashing methods including supervised and unsupervised methods.
Abstract: Recently, hashing methods have attracted more and more attentions for their effectiveness in large scale data search, e.g., images and videos data. etc. For different s-cenarios, unsupervised, supervised and semi-supervised hashing methods have been proposed. Especially, when semantic information is available, supervised hashing methods show better performance than unsupervised ones. In many practical applications, one sample usually has more than one label, which has been considered by multi-label learning. However, few supervised hashing methods consider such scenario. In this paper, we propose a Multi-label Least-Squares Hashing (MLSH) method for multi-label data hashing. It can directly work well on multi-label data; moreover, unlike other hashing methods which directly learn hashing function-s on original data, MLSH first utilizes the equivalen-t form of CCA and Least-Squares to project original multi-label data into lower-dimensional space; then, in the lower-dimensional space, it learns the project matrix and gets final binary codes of data. MLSH is tested on NUS-WIDE and CIFAR-100 which are widely used for searching task. The results show that MLSH outperforms several state-of-the-art hashing methods including supervised and unsupervised methods.

13 citations

Proceedings Article
Yadong Mu1, Wei Liu2, Cheng Deng3, Zongting Lv2, Xinbo Gao3 
09 Jul 2016
TL;DR: This paper attacks the crossview hashing problem by simultaneously capturing semantic neighboring relations and maximizing the generative probability of the learned hash codes in each view, and develops a novel formulation and optimization scheme for cross-view hashing.
Abstract: Learning compact hash codes has been a vibrant research topic for large-scale similarity search owing to the low storage cost and expedited search operation. A recent research thrust aims to learn compact codes jointly from multiple sources, referred to as cross-view (or cross-modal) hashing in the literature. The main theme of this paper is to develop a novel formulation and optimization scheme for cross-view hashing. As a key differentiator, our proposed method directly conducts optimization on discrete binary hash codes, rather than relaxed continuous variables as in existing cross-view hashing methods. This way relaxation-induced search accuracy loss can be avoided. We attack the crossview hashing problem by simultaneously capturing semantic neighboring relations and maximizing the generative probability of the learned hash codes in each view. Specifically, to enable effective optimization on discrete hash codes, the optimization proceeds in a block coordinate descent fashion. Each iteration sequentially updates a single bit with others clamped. We transform the resultant sub-problem into an equivalent, more tractable quadratic form and devise an active set based solver on the discrete codes. Rigorous theoretical analysis is provided for the convergence and local optimality condition. Comprehensive evaluations are conducted on three image benchmarks. The clearly superior experimental results faithfully prove the merits of the proposed method.

13 citations

Posted Content
TL;DR: In this paper, the authors proposed a scalable clustering algorithm based on Locality Sensitive Hashing (LSH) to approximate the density gradient ascent in mean shift clustering.
Abstract: In this paper we target the class of modal clustering methods where clusters are defined in terms of the local modes of the probability density function which generates the data. The most well-known modal clustering method is the k-means clustering. Mean Shift clustering is a generalization of the k-means clustering which computes arbitrarily shaped clusters as defined as the basins of attraction to the local modes created by the density gradient ascent paths. Despite its potential, the Mean Shift approach is a computationally expensive method for unsupervised learning. Thus, we introduce two contributions aiming to provide clustering algorithms with a linear time complexity, as opposed to the quadratic time complexity for the exact Mean Shift clustering. Firstly we propose a scalable procedure to approximate the density gradient ascent. Second, our proposed scalable cluster labeling technique is presented. Both propositions are based on Locality Sensitive Hashing (LSH) to approximate nearest neighbors. These two techniques may be used for moderate sized datasets. Furthermore, we show that using our proposed approximations of the density gradient ascent as a pre-processing step in other clustering methods can also improve dedicated classification metrics. For the latter, a distributed implementation, written for the Spark/Scala ecosystem is proposed. For all these considered clustering methods, we present experimental results illustrating their labeling accuracy and their potential to solve concrete problems.

13 citations

Proceedings ArticleDOI
27 May 2018
TL;DR: This paper begins the study of distance-sensitive hashing (DSH), a generalization of LSH that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them, and extends existing LSH lower bounds, showing that they also hold in the asymmetric setting.
Abstract: Locality-sensitive hashing (LSH) is an important tool for managing high-dimensional noisy or uncertain data, for example in connection with data cleaning (similarity join) and noise-robust search (similarity search). However, for a number of problems the LSH framework is not known to yield good solutions, and instead ad hoc solutions have been designed for particular similarity and distance measures. For example, this is true for output-sensitive similarity search/join, and for indexes supporting annulus queries that aim to report a point close to a certain given distance from the query point. In this paper we initiate the study of distance-sensitive hashing (DSH), a generalization of LSH that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them. More precisely, given a distance space (X, dist ) and a "collision probability function" (CPF) f: R -> [0,1] we seek a distribution over pairs of functions (h,g) such that for every pair of points x, y in X the collision probability is ¶r[h(x)=g(y)] = f(dist(x,y)). Locality-sensitive hashing is the study of how fast a CPF can decrease as the distance grows. For many spaces, f can be made exponentially decreasing even if we restrict attention to the symmetric case where g=h. We show that the asymmetry achieved by having a pair of functions makes it possible to achieve CPFs that are, for example, increasing or unimodal, and show how this leads to principled solutions to problems not addressed by the LSH framework. This includes a novel application to privacy-preserving distance estimation. We believe that the DSH framework will find further applications in high-dimensional data management. To put the running time bounds of the proposed constructions into perspective, we show lower bounds for the performance of DSH constructions with increasing and decreasing CPFs under angular distance. Essentially, this shows that our constructions are tight up to lower order terms. In particular, we extend existing LSH lower bounds, showing that they also hold in the asymmetric setting.

13 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139