scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings ArticleDOI
01 Oct 2013
TL;DR: It is pointed out that the approach via the cavity method extends quite naturally to the analysis of double hashing and allows to compute the corresponding threshold and shows that the graph induced by the double hashing scheme has the same local weak limit as the one obtained with full randomness.
Abstract: A lot of interest has recently arisen in the analysis of multiple-choice “cuckoo hashing” schemes. In this context, a main performance criterion is the load threshold under which the hashing scheme is able to build a valid hashtable with high probability in the limit of large systems; various techniques have successfully been used to answer this question (differential equations, combinatorics, cavity method) for increasing levels of generality of the model. However, the hashing scheme analysed so far is quite utopic in that it requires to generate a lot of independent, fully random choices. Schemes with reduced randomness exists, such as “double hashing”, which is expected to provide similar asymptotic results as the ideal scheme, yet they have been more resistant to analysis so far. In this paper, we point out that the approach via the cavity method extends quite naturally to the analysis of double hashing and allows to compute the corresponding threshold. The path followed is to show that the graph induced by the double hashing scheme has the same local weak limit as the one obtained with full randomness.

13 citations

Proceedings ArticleDOI
01 May 2018
TL;DR: Experimental results show that proposed model is more accurate and robust than traditional models and good for speaker recognition.
Abstract: The Mel-Frequency Cepstral Coefficients (MFCC) feature can be cast-off in speaker recognition. The process of feature extraction of the speech signal using Mel-Frequency Cepstral Coefficients (MFCC) feature vectors will generate an acoustic speech signal. Locality Sensitive Hashing (LSH) is frequently used as a classifier for Big Data related problems. In this research, we proposed a new model based on MFCC and LSH to integrate into speaker recognition model. The main returns of our newly proposed model are to get robustness, effective and accurate results in comparison with MFCC+GMM, LPCC+GMM and MFCC+PNN models. This model also contributes to the literature of Big Data. In this model, first, we extract the MFCC features from the wave file then we applied LSH classifier on extracted feature to transform into hash-table. Finally, the hash-tables of train and test wave files are compared and obtained 92.66% speaker recognition accuracy. We compared the accuracy ratio of proposed model with other traditional models namely MFCC+GMM, MFCC+PNN, and LPCC+GMM. Experimental results show that proposed model is more accurate and robust than traditional models and good for speaker recognition.

13 citations

Book ChapterDOI
05 Jan 2011
TL;DR: This paper proposes a new high dimensional NN search method, called Randomly Projected kd-trees (RP-kd-Trees), which is to project data points into a lower-dimensional space so as to exploit the advantage of multiple kd -trees over low-dimensional data.
Abstract: Efficient nearest neighbor (NN) search techniques for high-dimensional data are crucial to content-based image retrieval (CBIR). Traditional data structures (e.g., kd-tree) usually are only efficient for low dimensional data, but often perform no better than a simple exhaustive linear search when the number of dimensions is large enough. Recently, approximate NN search techniques have been proposed for high-dimensional search, such as Locality-Sensitive Hashing (LSH), which adopts some random projection approach. Motivated by similar idea, in this paper, we propose a new high dimensional NN search method, called Randomly Projected kd-Trees (RP-kd-Trees), which is to project data points into a lower-dimensional space so as to exploit the advantage of multiple kd-trees over low-dimensional data. Based on the proposed framework, we present an enhanced RP-kd-Trees scheme by applying distance metric learning techniques. We conducted extensive empirical studies on CBIR, which showed that our technique achieved faster search performance with better retrieval quality than regular LSH algorithms.

13 citations

Proceedings ArticleDOI
14 Jul 2014
TL;DR: A novel data-driven hashing method called forest hashing, which utilizes multiple tree structures to perform data hashing by leveraging the index structure of trees, which can significantly improve the hashing efficacy by generating balanced hash buckets.
Abstract: Indexing images and videos using binary hash bits has shown promising results for fast similarity search. Existing datadriven hashing methods learn compact hash codes from the data, but usually with the cost of generating unbalanced hash buckets, thus affecting the search efficiency. We propose a novel data-driven hashing method called forest hashing, which utilizes multiple tree structures to perform data hashing. By leveraging the index structure of trees, we can significantly improve the hashing efficacy by generating balanced hash buckets. Moreover, forest hashing naturally supports scalable coding where more trees can improve the coding quality with a longer code. Last but not the least, our forest hashing can be easily extended for semantic search by integrating semi-supervised label information. Experiments on two benchmark datasets show favorable results compared with the state-of-the-art hashing methods.

13 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139