Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Data-Dependent Hashing Based on p-Stable Distribution

[...]

Xiao Bai¹, Haichuan Yang¹, Jun Zhou², Peng Ren³, Jian Cheng⁴ - Show less +1 more•Institutions (4)

Beihang University¹, Griffith University², China University of Petroleum³, Chinese Academy of Sciences⁴

27 Aug 2014-IEEE Transactions on Image Processing

TL;DR: This paper begins by formulating the Euclidean distance preserving property in terms of variance estimation, and develops a projection method, which maps the original data to arbitrary dimensional vectors, which results in a supervised hashing scheme, which preserves semantic similarity of data.

...read moreread less

Abstract: The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based on this property, we develop a projection method, which maps the original data to arbitrary dimensional vectors. Each projection vector is a linear combination of multiple random vectors subject to p-stable distribution, in which the weights for the linear combination are learned based on the training data. An orthogonal matrix is then learned data-dependently for minimizing the thresholding error in quantization. Combining the projection method and orthogonal matrix, we develop an unsupervised hashing scheme, which preserves the Euclidean distance. Compared with data-independent hashing methods, our method takes the data distribution into consideration and gives more accurate hashing results with compact hash codes. Different from many data-dependent hashing methods, our method accommodates multiple hash tables and is not restricted by the number of hash functions. To extend our method to a supervised scenario, we incorporate a supervised label propagation scheme into the proposed projection method. This results in a supervised hashing scheme, which preserves semantic similarity of data. Experimental results show that our methods have outperformed several state-of-the-art hashing approaches in both effectiveness and efficiency.

...read moreread less

44 citations

Proceedings Article•DOI•

Supervised Hashing with Pseudo Labels for Scalable Multimedia Retrieval

[...]

Jingkuan Song¹, Lianli Gao², Yan Yan¹, Dongxiang Zhang³, Nicu Sebe¹ - Show less +1 more•Institutions (3)

University of Trento¹, University of Electronic Science and Technology of China², National University of Singapore³

13 Oct 2015

TL;DR: A Supervised Hashing with Pseudo Labels (SHPL) which uses the cluster centers of the training data to generate pseudo labels, based on which the hash codes can be generated using the criteria of supervised hashing, and it is proved that the pseudo labels and the hash code can be jointly learned and iteratively updated in an unified framework.

...read moreread less

Abstract: There is an increasing interest in using hash codes for efficient multimedia retrieval and data storage. The hash functions are learned in such a way that the hash codes can preserve essential properties of the original space or the label information. Then the Hamming distance of the hash codes can approximate the data similarity. Existing works have demonstrated the success of many supervised hashing models. However, labeling data is time and labor consuming, especially for scalable datasets. In order to utilize the supervised hashing models to improve the discriminative power of hash codes, we propose a Supervised Hashing with Pseudo Labels (SHPL) which uses the cluster centers of the training data to generate pseudo labels, based on which the hash codes can be generated using the criteria of supervised hashing. More specifically, we utilize linear discriminant analysis (LDA) with trace ratio criterion as a showcase for hash functions learning and during the optimization, we prove that the pseudo labels and the hash codes can be jointly learned and iteratively updated in an unified framework. The learned hash functions can harness the discriminant power of trace ratio criterion, and thus can achieve better performance. Experimental results on three large-scale unlabeled datasets (i.e., SIFT1M, GIST1M, and SIFT1B) demonstrate the superior performance of our SHPL over existing hashing methods.

...read moreread less

44 citations

Journal Article•DOI•

LSH-Preserving Functions and Their Applications

[...]

Flavio Chierichetti, Ravi Kumar¹•Institutions (1)

Google¹

02 Nov 2015-Journal of the ACM

TL;DR: This article generalizes the well-known LSH for the Jaccard set similarity, namely, the minwise-independent permutations, and obtains LSHs for many set similarity measures that are used in practice.

...read moreread less

Abstract: Locality sensitive hashing (LSH) is a key algorithmic tool that is widely used both in theory and practice. An important goal in the study of LSH is to understand which similarity functions admit an LSH, that is, are LSHable. In this article, we focus on the class of transformations such that given any similarity that is LSHable, the transformed similarity will continue to be LSHable. We show a tight characterization of all such LSH-preserving transformations: they are precisely the probability generating functions, up to scaling.As a concrete application of this result, we study which set similarity measures are LSHable. We obtain a complete characterization of similarity measures between two sets A and B that are ratios of two linear functions of mA∩ Bm, mAuBm, mA∪Bm: such a measure is LSHable if and only if its corresponding distance is a metric. This result generalizes the well-known LSH for the Jaccard set similarity, namely, the minwise-independent permutations, and obtains LSHs for many set similarity measures that are used in practice. Using our main result, we obtain a similar characterization for set similarities involving radicals.

...read moreread less

44 citations

Book Chapter•DOI•

Efficient search in document image collections

[...]

Anand Kumar¹, C. V. Jawahar¹, R. Manmatha²•Institutions (2)

International Institute of Information Technology, Hyderabad¹, University of Massachusetts Amherst²

18 Nov 2007

TL;DR: This paper presents an efficient indexing and retrieval scheme for searching in document image databases that achieves high precision and recall, using a large image corpus consisting of seven Kalidasa's books in the Telugu language.

...read moreread less

Abstract: This paper presents an efficient indexing and retrieval scheme for searching in document image databases. In many non-European languages, optical character recognizers are not very accurate. Word spotting - word image matching - may instead be used to retrieve word images in response to a word image query. The approaches used for word spotting so far, dynamic time warping and/or nearest neighbor search, tend to be slow. Here indexing is done using locality sensitive hashing (LSH) - a technique which computes multiple hashes - using word image features computed at word level. Efficiency and scalability is achieved by content-sensitive hashing implemented through approximate nearest neighbor computation. We demonstrate that the technique achieves high precision and recall (in the 90% range), using a large image corpus consisting of seven Kalidasa's (a well known Indian poet of antiquity) books in the Telugu language. The accuracy is comparable to using dynamic time warping and nearest neighbor search while the speed is orders of magnitude better - 20000 word images can be searched in milliseconds.

...read moreread less

44 citations

Proceedings Article•DOI•

Multilinear Hyperplane Hashing

[...]

Xianglong Liu¹, Xinjie Fan, Cheng Deng², Zhujin Li¹, Hao Su³, Dacheng Tao⁴ - Show less +2 more•Institutions (4)

Beihang University¹, Xidian University², Stanford University³, University of Technology, Sydney⁴

01 Jun 2016

TL;DR: A multilinear hyperplane hashing that generates a hash bit using multiple linear projections with strong locality sensitivity to hyperplane queries is proposed and an angular quantization based learning framework for compact multil inear hashing is introduced, which considerably boosts the search performance with less hash bits.

...read moreread less

Abstract: Hashing has become an increasingly popular technique for fast nearest neighbor search. Despite its successful progress in classic pointto-point search, there are few studies regarding point-to-hyperplane search, which has strong practical capabilities of scaling up applications like active learning with SVMs. Existing hyperplane hashing methods enable the fast search based on randomly generated hash codes, but still suffer from a low collision probability and thus usually require long codes for a satisfying performance. To overcome this problem, this paper proposes a multilinear hyperplane hashing that generates a hash bit using multiple linear projections. Our theoretical analysis shows that with an even number of random linear projections, the multilinear hash function possesses strong locality sensitivity to hyperplane queries. To leverage its sensitivity to the angle distance, we further introduce an angular quantization based learning framework for compact multilinear hashing, which considerably boosts the search performance with less hash bits. Experiments with applications to large-scale (up to one million) active learning on two datasets demonstrate the overall superiority of the proposed approach.

...read moreread less

44 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics