Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

[...]

Xiao Ming Zhou¹, Chuan-Heng Ang¹•Institutions (1)

National University of Singapore¹

01 Aug 1997-Pattern Recognition Letters

TL;DR: A more succinct and precise definition of spatial relationships in 2D space and a new approach for representing a picture by a set of hashing values that avoids the ambiguity problems that exist in other methods are introduced.

...read moreread less

20 citations

Book Chapter•DOI•

Hashing-Based Distributed Multi-party Blocking for Privacy-Preserving Record Linkage

[...]

Thilina Ranbaduge¹, Dinusha Vatsalan¹, Peter Christen¹, Vassilios S. Verykios²•Institutions (2)

Australian National University¹, Hellenic Open University²

19 Apr 2016

TL;DR: This work proposes a novel blocking approach for multi-party PPRL to efficiently and effectively prune the record sets that are unlikely to match and provides an analysis of the technique in terms of complexity, quality, and privacy.

...read moreread less

Abstract: In many application domains organizations require information from multiple sources to be integrated. Due to privacy and confidentiality concerns often these organizations are not willing or allowed to reveal their sensitive and personal data to other database owners, and to any external party. This has led to the emerging research discipline of privacy-preserving record linkage PPRL. We propose a novel blocking approach for multi-party PPRL to efficiently and effectively prune the record sets that are unlikely to match. Our approach allows each database owner to perform blocking independently except for the initial agreement of parameter settings and a final central hashing-based clustering. We provide an analysis of our technique in terms of complexity, quality, and privacy, and conduct an empirical study with large datasets. The results show that our approach is scalable with the size of the datasets and the number of parties, while providing better quality and privacy than previous multi-party private blocking approaches.

...read moreread less

20 citations

Journal Article•DOI•

A Bayesian Perspective on Locality Sensitive Hashing with Extensions for Kernel Methods

[...]

Aniket Chakrabarti¹, Venu Satuluri², Atreya Srivathsan³, Srinivasan Parthasarathy¹•Institutions (3)

Ohio State University¹, Twitter², Amazon.com³

12 Oct 2015-ACM Transactions on Knowledge Discovery From Data

TL;DR: This article presents BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search—performing candidate pruning and similarity estimation using LSH, and extends the Bayes LSH algorithm for kernel methods—in which the similarity between two data objects is defined by a kernel function.

...read moreread less

Abstract: Given a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks us to find all pairs of objects with similarity greater than a certain user-specified threshold. In order to reduce the number of candidates to search, locality-sensitive hashing (LSH) based indexing methods are very effective. However, most such methods only use LSH for the first phase of similarity search—that is, efficient indexing for candidate generation. In this article, we present BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search—performing candidate pruning and similarity estimation using LSH. A simpler variant, BayesLSH-Lite, which calculates similarities exactly, is also presented. Our algorithms are able to quickly prune away a large majority of the false positive candidate pairs, leading to significant speedups over baseline approaches. For BayesLSH, we also provide probabilistic guarantees on the quality of the output, both in terms of accuracy and recall. Finally, the quality of BayesLSH’s output can be easily tuned and does not require any manual setting of the number of hashes to use for similarity estimation, unlike standard approaches. For two state-of-the-art candidate generation algorithms, AllPairs and LSH, BayesLSH enables significant speedups, typically in the range 2 × --20 × for a wide variety of datasets. We also extend the BayesLSH algorithm for kernel methods—in which the similarity between two data objects is defined by a kernel function. Since the embedding of data points in the transformed kernel space is unknown, algorithms such as AllPairs which rely on building inverted index structure for fast similarity search do not work with kernel functions. Exhaustive search across all possible pairs is also not an option since the dataset can be huge and computing the kernel values for each pair can be prohibitive. We propose K-BayesLSH an all-pairs similarity search problem for kernel functions. K-BayesLSH leverages a recently proposed idea—kernelized locality sensitive hashing (KLSH)—for hash bit computation and candidate generation, and uses the aforementioned BayesLSH idea for candidate pruning and similarity estimation. We ran a broad spectrum of experiments on a variety of datasets drawn from different domains and with distinct kernels and find a speedup of 2 × --7 × over vanilla KLSH.

...read moreread less

20 citations

Proceedings Article•DOI•

Fast dense stereo correspondences by binary locality sensitive hashing

[...]

Philipp Heise¹, Brian Jensen¹, Sebastian Klose¹, Alois Knoll¹•Institutions (1)

Technische Universität München¹

26 May 2015

TL;DR: This work proposes a novel stereo correspondence estimation algorithm that employs binary locality sensitive hashing and is well suited to implementation on the GPU, capable of processing very high-resolution stereo images at near real-time rates.

...read moreread less

Abstract: The stereo correspondence problem is still a highly active topic of research with many applications in the robotic domain. Still many state of the art algorithms proposed to date are unable to reasonably handle high resolution images due to their run time complexities or memory requirements. In this work we propose a novel stereo correspondence estimation algorithm that employs binary locality sensitive hashing and is well suited to implementation on the GPU. Our proposed method is capable of processing very high-resolution stereo images at near real-time rates. An evaluation on the new Middlebury and Disney high-resolution stereo benchmarks demonstrates that our proposed method performs well compared to existing state of the art algorithms.

...read moreread less

20 citations

Proceedings Article•

SMYRF - Efficient Attention using Asymmetric Clustering

[...]

Giannis Daras¹, Nikita Kitaev², Augustus Odena³, Alexandros G. Dimakis⁴•Institutions (4)

National Technical University of Athens¹, University of California, Berkeley², Google³, University of Texas at Austin⁴

01 Jan 2020

TL;DR: The algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters that can be used interchangeably with dense attention before and after training.

...read moreread less

Abstract: We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from $O(N^2)$ to $O(N \log N)$, where $N$ is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can be used as a drop-in replacement for dense attention layers without any retraining. On the contrary, prior fast attention methods impose constraints (e.g. queries and keys share the same vector representations) and require re-training from scratch. We apply our method to pre-trained state-of-the-art Natural Language Processing and Computer Vision models and we report significant memory and speed benefits. Notably, SMYRF-BERT outperforms (slightly) BERT on GLUE, while using $50\%$ less memory. We also show that SMYRF can be used interchangeably with dense attention before and after training. Finally, we use SMYRF to train GANs with attention in high resolutions. Using a single TPU, we were able to scale attention to 128x128=16k and 256x256=65k tokens on BigGAN on CelebA-HQ.

...read moreread less

20 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics