scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes a novel hashing method, named semi-paired hashing (SPH), to deal with a more challenging cross-view retrieval task, where only partial pairwise correspondences are provided in advance.

15 citations

Proceedings ArticleDOI
25 Mar 2012
TL;DR: A novel method of efficiently searching very large populations of speakers, tens of thousands or more, using an utterance comparison model proposed in a previous work, which allows much more efficient comparison of utterances compared to the traditional Gaussian Mixture Model(GMM)-based approach.
Abstract: We propose a novel method of efficiently searching very large populations of speakers, tens of thousands or more, using an utterance comparison model proposed in a previous work. The model allows much more efficient comparison of utterances compared to the traditional Gaussian Mixture Model(GMM)-based approach because of its computational simplicity while maintaining high accuracy. Furthermore, efficiency can be drastically improved when approximating searches using kernelized locality-sensitive hashing (KLSH). From a speaker's utterance, a set of statistics are extracted according to the utterance comparison model and converted to a set of hash key bits. An Approximate Nearest Neighbor search using the Hamming Distance can be done to find candidate matches with the query speaker, which are then rank-ordered by linearly comparing them with the query using the utterance comparison model. Compared to GMM-based speaker identification and some of its variants that have been proposed to increase its efficiency, the proposed KLSH-based method is orders of magnitude faster while compromising a negligible amount of accuracy for sufficiently long query utterances. At a more fundamental level, we also discuss how our speaker matching framework differs from the traditional Bayesian decision rule used for speaker identification.

15 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: A hashing function for the application of face template protection, which improves the correctness of existing algorithms while maintaining the security simultaneously, and reaches the REQ-WBP (Weak Biometric Privacy) security level, which implies irreversibility.
Abstract: In this paper, we present a hashing function for the application of face template protection, which improves the correctness of existing algorithms while maintaining the security simultaneously. The novel architecture constructed based on four components: a self-defined concept called padding people, Random Fourier Features, Support Vector Machine, and Locality Sensitive Hashing. The proposed method is trained, with one-shot and multi-shot enrollment, to encode the user’s biometric data to a predefined output with high probability. The predefined hashing output is cryptographically hashed and stored as a secure face template. Predesigning outputs ensures the strict requirements of biometric cryptosystems, namely, randomness and unlinkability. We prove that our method reaches the REQ-WBP (Weak Biometric Privacy) security level, which implies irreversibility. The efficacy of our approach is evaluated on the widely used CMU-PIE, FEI, andFERET databases; our matching performances achieve 100% genuine acceptance rate at 0% false acceptance rate for all three databases and enrollment types. To our knowledge, our matching results outperform most of state-of-the-art results.

15 citations

Posted Content
TL;DR: A straightforward CNN-based hashing method, i.e. binarilizing the activations of a fully connected layer with threshold 0 and taking the binary result as hash codes, which achieved the best performance on CIFAR-10 and was comparable with the state-of-the-art on MNIST.
Abstract: Along with data on the web increasing dramatically, hashing is becoming more and more popular as a method of approximate nearest neighbor search. Previous supervised hashing methods utilized similarity/dissimilarity matrix to get semantic information. But the matrix is not easy to construct for a new dataset. Rather than to reconstruct the matrix, we proposed a straightforward CNN-based hashing method, i.e. binarilizing the activations of a fully connected layer with threshold 0 and taking the binary result as hash codes. This method achieved the best performance on CIFAR-10 and was comparable with the state-of-the-art on MNIST. And our experiments on CIFAR-10 suggested that the signs of activations may carry more information than the relative values of activations between samples, and that the co-adaption between feature extractor and hash functions is important for hashing.

15 citations

Proceedings Article
09 Jul 2016
TL;DR: A very straightforward supervised hashing algorithm is proposed and demonstrated to treat label vectors as binary codes and to learn target codes which have similar structure to label vectors to circumvent direct optimization on large n × n Gram matrices.
Abstract: Among learning-based hashing methods, supervised hashing tries to find hash codes which preserve semantic similarities of original data. Recent years have witnessed much efforts devoted to design objective functions and optimization methods for supervised hashing learning, in order to improve search accuracy and reduce training cost. In this paper, we propose a very straightforward supervised hashing algorithm and demonstrate its superiority over several state-of-the-art methods. The key idea of our approach is to treat label vectors as binary codes and to learn target codes which have similar structure to label vectors. To circumvent direct optimization on large n × n Gram matrices, we identify an inner-product-preserving transformation and use it to bring close label vectors and hash codes without changing the structure. The optimization process is very efficient and scales well. In our experiment, training 16-bit and 96-bit code on NUS-WIDE cost respectively only 3 and 6 minutes.

15 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139