scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The proposed cam weighted distance is orientation and scale adaptive to take advantage of the relevant information of inter-prototype relationship, so that a better classification performance can be achieved.

63 citations

Proceedings ArticleDOI
29 Oct 2010
TL;DR: An image-based indoor localization system using omnidirectional panoramic images to which location information is added by the combination of the robust image matching by PCA-SIFT and fast nearest neighbor search algorithm based on Locality Sensitive Hashing (LSH).
Abstract: In this paper, we developed an image-based indoor localization system using omnidirectional panoramic images to which location information is added. By the combination of the robust image matching by PCA-SIFT and fast nearest neighbor search algorithm based on Locality Sensitive Hashing (LSH), our system can estimate users' positions with high accuracy and in a short time. To improve the precision, we introduced the "confidence" of the image matching results. We conducted experiments at the Railway Museum and we obtained 426 omnidirectional panoramic reference images and 1067 supplemental images for image matching. Experimental results using 126 test images demonstrated that the location detection accuracy is above 90% with about 2.2s of processing time.

63 citations

Proceedings Article
01 Jan 2006
TL;DR: To scale the search to large song databases, an algorithm based on localitysensitive hashing (LSH) of sequences of audio features called audio shingles provides an efficient means to identify approximate nearest neighbors in a high-dimensional feature space.
Abstract: We present new methods for computing inter-song similarities using intersections between multiple audio pieces. The intersection contains portions that are similar, when one song is a derivative work of the other for example, in two different musical recordings. To scale our search to large song databases we have developed an algorithm based on localitysensitive hashing (LSH) of sequences of audio features called audio shingles. LSH provides an efficient means to identify approximate nearest neighbors in a high-dimensional feature space. We combine these nearest neighbor estimates, each a match from a very large database of audio to a small portion of the query song, to form a measure of the approximate similarity. We demonstrate the utility of our methods on a derivative works retrieval experiment using both exact and approximate (LSH) methods. The results show that LSH is at least an order of magnitude faster than the exact nearest neighbor method and that accuracy is not impacted by the approximate method.

62 citations

Proceedings ArticleDOI
13 Oct 2015
TL;DR: This paper proposes a novel unsupervised hashing approach, dubbed multi-view latent hashing (MVLH), to effectively incorporate multi- view data into hash code learning, and provides a novel scheme to directly learn the codes without resorting to continuous relaxations.
Abstract: Hashing techniques have attracted broad research interests in recent multimedia studies. However, most of existing hashing methods focus on learning binary codes from data with only one single view, and thus cannot fully utilize the rich information from multiple views of data. In this paper, we propose a novel unsupervised hashing approach, dubbed multi-view latent hashing (MVLH), to effectively incorporate multi-view data into hash code learning. Specifically, the binary codes are learned by the latent factors shared by multiple views from an unified kernel feature space, where the weights of different views are adaptively learned according to the reconstruction error with each view. We then propose to solve the associate optimization problem with an efficient alternating algorithm. To obtain high-quality binary codes, we provide a novel scheme to directly learn the codes without resorting to continuous relaxations, where each bit is efficiently computed in a closed form. We evaluate the proposed method on several large-scale datasets and the results demonstrate the superiority of our method over several other state-of-the-art methods.

62 citations

Journal ArticleDOI
TL;DR: A new fingerprint, called MinHash fingerprint, up to six bonds (MHFP6), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms.
Abstract: Among the various molecular fingerprints available to describe small organic molecules, extended connectivity fingerprint, up to four bonds (ECFP4) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥ 1024D) to perform well, resulting in ECFP4 nearest neighbor searches in very large databases such as GDB, PubChem or ZINC to perform very slowly due to the curse of dimensionality. Herein we report a new fingerprint, called MinHash fingerprint, up to six bonds (MHFP6), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms. To describe a molecule, MHFP6 extracts the SMILES of all circular substructures around each atom up to a diameter of six bonds and applies the MinHash method to the resulting set. MHFP6 outperforms ECFP4 in benchmarking analog recovery studies. By leveraging locality sensitive hashing, LSH approximate nearest neighbor search methods perform as well on unfolded MHFP6 as comparable methods do on folded ECFP4 fingerprints in terms of speed and relative recovery rate, while operating in very sparse and high-dimensional binary chemical space. MHFP6 is a new molecular fingerprint, encoding circular substructures, which outperforms ECFP4 for analog searches while allowing the direct application of locality sensitive hashing algorithms. It should be well suited for the analysis of large databases. The source code for MHFP6 is available on GitHub ( https://github.com/reymond-group/mhfp ).

62 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139