Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

LSH-preserving functions and their applications

[...]

Flavio Chierichetti¹, Ravi Kumar²•Institutions (2)

Cornell University¹, Yahoo!²

17 Jan 2012

TL;DR: This paper generalizes the well-known LSH for the Jaccard set similarity, namely, the minwise-independent permutations, and obtains LSHs for many set similarity measures that are used in practice.

...read moreread less

Abstract: Locality sensitive hashing (LSH) is a key algorithmic tool that is widely used both in theory and practice. An important goal in the study of LSH is to understand which similarity functions admit an LSH, i.e., are LSHable. In this paper we focus on the class of transformations such that given any similarity that is LSHable, the transformed similarity will continue to be LSHable. We show a tight characterization of all such LSH-preserving transformations: they are precisely the probability generating functions, up to scaling.As a concrete application of this result, we study which set similarity measures are LSHable. We obtain a complete characterization of similarity measures between two sets A and B that are ratios of two linear functions of |A ∩ B|, |A Δ B|, |A ∪ B|: such a measure is LSHable if and only if its corresponding distance is a metric. This result generalizes the well-known LSH for the Jaccard set similarity, namely, the minwise-independent permutations, and obtains LSHs for many set similarity measures that are used in practice. Using our main result, we obtain a similar characterization for set similarities involving radicals.

...read moreread less

13 citations

Proceedings Article•

A multi-label least-squares hashing for scalable image search

[...]

Shengsheng Wang, Zi Huang¹, Xin-Shun Xu²•Institutions (2)

University of Queensland¹, Shandong University²

01 Jan 2015

TL;DR: A Multi-label Least-Squares Hashing (MLSH) method for multi-label data hashing, which outperforms several state-of-the-art hashing methods including supervised and unsupervised methods.

...read moreread less

Abstract: Recently, hashing methods have attracted more and more attentions for their effectiveness in large scale data search, e.g., images and videos data. etc. For different s-cenarios, unsupervised, supervised and semi-supervised hashing methods have been proposed. Especially, when semantic information is available, supervised hashing methods show better performance than unsupervised ones. In many practical applications, one sample usually has more than one label, which has been considered by multi-label learning. However, few supervised hashing methods consider such scenario. In this paper, we propose a Multi-label Least-Squares Hashing (MLSH) method for multi-label data hashing. It can directly work well on multi-label data; moreover, unlike other hashing methods which directly learn hashing function-s on original data, MLSH first utilizes the equivalen-t form of CCA and Least-Squares to project original multi-label data into lower-dimensional space; then, in the lower-dimensional space, it learns the project matrix and gets final binary codes of data. MLSH is tested on NUS-WIDE and CIFAR-100 which are widely used for searching task. The results show that MLSH outperforms several state-of-the-art hashing methods including supervised and unsupervised methods.

...read moreread less

13 citations

Proceedings Article•

Coordinate discrete optimization for efficient cross-view image retrieval

[...]

Yadong Mu¹, Wei Liu², Cheng Deng³, Zongting Lv², Xinbo Gao³ - Show less +1 more•Institutions (3)

Peking University¹, DiDi², Xidian University³

09 Jul 2016

TL;DR: This paper attacks the crossview hashing problem by simultaneously capturing semantic neighboring relations and maximizing the generative probability of the learned hash codes in each view, and develops a novel formulation and optimization scheme for cross-view hashing.

...read moreread less

Abstract: Learning compact hash codes has been a vibrant research topic for large-scale similarity search owing to the low storage cost and expedited search operation. A recent research thrust aims to learn compact codes jointly from multiple sources, referred to as cross-view (or cross-modal) hashing in the literature. The main theme of this paper is to develop a novel formulation and optimization scheme for cross-view hashing. As a key differentiator, our proposed method directly conducts optimization on discrete binary hash codes, rather than relaxed continuous variables as in existing cross-view hashing methods. This way relaxation-induced search accuracy loss can be avoided. We attack the crossview hashing problem by simultaneously capturing semantic neighboring relations and maximizing the generative probability of the learned hash codes in each view. Specifically, to enable effective optimization on discrete hash codes, the optimization proceeds in a block coordinate descent fashion. Each iteration sequentially updates a single bit with others clamped. We transform the resultant sub-problem into an equivalent, more tractable quadratic form and devise an active set based solver on the discrete codes. Rigorous theoretical analysis is provided for the convergence and local optimality condition. Comprehensive evaluations are conducted on three image benchmarks. The clearly superior experimental results faithfully prove the merits of the proposed method.

...read moreread less

13 citations

Posted Content•

A Distributed and Approximated Nearest Neighbors Algorithm for an Efficient Large Scale Mean Shift Clustering

[...]

Gaël Beck, Tarn Duong, Mustapha Lebbah, Hanane Azzag, Christophe Cérin - Show less +1 more

11 Feb 2019-arXiv: Learning

TL;DR: In this paper, the authors proposed a scalable clustering algorithm based on Locality Sensitive Hashing (LSH) to approximate the density gradient ascent in mean shift clustering.

...read moreread less

Abstract: In this paper we target the class of modal clustering methods where clusters are defined in terms of the local modes of the probability density function which generates the data. The most well-known modal clustering method is the k-means clustering. Mean Shift clustering is a generalization of the k-means clustering which computes arbitrarily shaped clusters as defined as the basins of attraction to the local modes created by the density gradient ascent paths. Despite its potential, the Mean Shift approach is a computationally expensive method for unsupervised learning. Thus, we introduce two contributions aiming to provide clustering algorithms with a linear time complexity, as opposed to the quadratic time complexity for the exact Mean Shift clustering. Firstly we propose a scalable procedure to approximate the density gradient ascent. Second, our proposed scalable cluster labeling technique is presented. Both propositions are based on Locality Sensitive Hashing (LSH) to approximate nearest neighbors. These two techniques may be used for moderate sized datasets. Furthermore, we show that using our proposed approximations of the density gradient ascent as a pre-processing step in other clustering methods can also improve dedicated classification metrics. For the latter, a distributed implementation, written for the Spark/Scala ecosystem is proposed. For all these considered clustering methods, we present experimental results illustrating their labeling accuracy and their potential to solve concrete problems.

...read moreread less

13 citations

Proceedings Article•DOI•

Distance-Sensitive Hashing

[...]

Martin Aumüller¹, Tobias Christiani¹, Rasmus Pagh¹, Francesco Silvestri²•Institutions (2)

IT University of Copenhagen¹, University of Padua²

27 May 2018

TL;DR: This paper begins the study of distance-sensitive hashing (DSH), a generalization of LSH that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them, and extends existing LSH lower bounds, showing that they also hold in the asymmetric setting.

...read moreread less

Abstract: Locality-sensitive hashing (LSH) is an important tool for managing high-dimensional noisy or uncertain data, for example in connection with data cleaning (similarity join) and noise-robust search (similarity search). However, for a number of problems the LSH framework is not known to yield good solutions, and instead ad hoc solutions have been designed for particular similarity and distance measures. For example, this is true for output-sensitive similarity search/join, and for indexes supporting annulus queries that aim to report a point close to a certain given distance from the query point. In this paper we initiate the study of distance-sensitive hashing (DSH), a generalization of LSH that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them. More precisely, given a distance space (X, dist ) and a "collision probability function" (CPF) f: R -> [0,1] we seek a distribution over pairs of functions (h,g) such that for every pair of points x, y in X the collision probability is ¶r[h(x)=g(y)] = f(dist(x,y)). Locality-sensitive hashing is the study of how fast a CPF can decrease as the distance grows. For many spaces, f can be made exponentially decreasing even if we restrict attention to the symmetric case where g=h. We show that the asymmetry achieved by having a pair of functions makes it possible to achieve CPFs that are, for example, increasing or unimodal, and show how this leads to principled solutions to problems not addressed by the LSH framework. This includes a novel application to privacy-preserving distance estimation. We believe that the DSH framework will find further applications in high-dimensional data management. To put the running time bounds of the proposed constructions into perspective, we show lower bounds for the performance of DSH constructions with increasing and decreasing CPFs under angular distance. Essentially, this shows that our constructions are tight up to lower order terms. In particular, we extend existing LSH lower bounds, showing that they also hold in the asymmetric setting.

...read moreread less

13 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics