scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A more succinct and precise definition of spatial relationships in 2D space and a new approach for representing a picture by a set of hashing values that avoids the ambiguity problems that exist in other methods are introduced.

20 citations

Book ChapterDOI
19 Apr 2016
TL;DR: This work proposes a novel blocking approach for multi-party PPRL to efficiently and effectively prune the record sets that are unlikely to match and provides an analysis of the technique in terms of complexity, quality, and privacy.
Abstract: In many application domains organizations require information from multiple sources to be integrated. Due to privacy and confidentiality concerns often these organizations are not willing or allowed to reveal their sensitive and personal data to other database owners, and to any external party. This has led to the emerging research discipline of privacy-preserving record linkage PPRL. We propose a novel blocking approach for multi-party PPRL to efficiently and effectively prune the record sets that are unlikely to match. Our approach allows each database owner to perform blocking independently except for the initial agreement of parameter settings and a final central hashing-based clustering. We provide an analysis of our technique in terms of complexity, quality, and privacy, and conduct an empirical study with large datasets. The results show that our approach is scalable with the size of the datasets and the number of parties, while providing better quality and privacy than previous multi-party private blocking approaches.

20 citations

Journal ArticleDOI
TL;DR: This article presents BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search—performing candidate pruning and similarity estimation using LSH, and extends the Bayes LSH algorithm for kernel methods—in which the similarity between two data objects is defined by a kernel function.
Abstract: Given a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks us to find all pairs of objects with similarity greater than a certain user-specified threshold. In order to reduce the number of candidates to search, locality-sensitive hashing (LSH) based indexing methods are very effective. However, most such methods only use LSH for the first phase of similarity search—that is, efficient indexing for candidate generation. In this article, we present BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search—performing candidate pruning and similarity estimation using LSH. A simpler variant, BayesLSH-Lite, which calculates similarities exactly, is also presented. Our algorithms are able to quickly prune away a large majority of the false positive candidate pairs, leading to significant speedups over baseline approaches. For BayesLSH, we also provide probabilistic guarantees on the quality of the output, both in terms of accuracy and recall. Finally, the quality of BayesLSH’s output can be easily tuned and does not require any manual setting of the number of hashes to use for similarity estimation, unlike standard approaches. For two state-of-the-art candidate generation algorithms, AllPairs and LSH, BayesLSH enables significant speedups, typically in the range 2 × --20 × for a wide variety of datasets. We also extend the BayesLSH algorithm for kernel methods—in which the similarity between two data objects is defined by a kernel function. Since the embedding of data points in the transformed kernel space is unknown, algorithms such as AllPairs which rely on building inverted index structure for fast similarity search do not work with kernel functions. Exhaustive search across all possible pairs is also not an option since the dataset can be huge and computing the kernel values for each pair can be prohibitive. We propose K-BayesLSH an all-pairs similarity search problem for kernel functions. K-BayesLSH leverages a recently proposed idea—kernelized locality sensitive hashing (KLSH)—for hash bit computation and candidate generation, and uses the aforementioned BayesLSH idea for candidate pruning and similarity estimation. We ran a broad spectrum of experiments on a variety of datasets drawn from different domains and with distinct kernels and find a speedup of 2 × --7 × over vanilla KLSH.

20 citations

Proceedings ArticleDOI
26 May 2015
TL;DR: This work proposes a novel stereo correspondence estimation algorithm that employs binary locality sensitive hashing and is well suited to implementation on the GPU, capable of processing very high-resolution stereo images at near real-time rates.
Abstract: The stereo correspondence problem is still a highly active topic of research with many applications in the robotic domain. Still many state of the art algorithms proposed to date are unable to reasonably handle high resolution images due to their run time complexities or memory requirements. In this work we propose a novel stereo correspondence estimation algorithm that employs binary locality sensitive hashing and is well suited to implementation on the GPU. Our proposed method is capable of processing very high-resolution stereo images at near real-time rates. An evaluation on the new Middlebury and Disney high-resolution stereo benchmarks demonstrates that our proposed method performs well compared to existing state of the art algorithms.

20 citations

Proceedings Article
01 Jan 2020
TL;DR: The algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters that can be used interchangeably with dense attention before and after training.
Abstract: We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from $O(N^2)$ to $O(N \log N)$, where $N$ is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can be used as a drop-in replacement for dense attention layers without any retraining. On the contrary, prior fast attention methods impose constraints (e.g. queries and keys share the same vector representations) and require re-training from scratch. We apply our method to pre-trained state-of-the-art Natural Language Processing and Computer Vision models and we report significant memory and speed benefits. Notably, SMYRF-BERT outperforms (slightly) BERT on GLUE, while using $50\%$ less memory. We also show that SMYRF can be used interchangeably with dense attention before and after training. Finally, we use SMYRF to train GANs with attention in high resolutions. Using a single TPU, we were able to scale attention to 128x128=16k and 256x256=65k tokens on BigGAN on CelebA-HQ.

20 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139