scispace - formally typeset
Open AccessPosted Content

Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

TLDR
The Indyk-Motwani locality sensitive hashing (LSH) framework as mentioned in this paper is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution of hash functions over locality sensitive hash functions that partition space.
Abstract
The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution $\mathcal{H}$ over locality-sensitive hash functions that partition space. For a collection of $n$ points, after preprocessing, the query time is dominated by $O(n^{\rho} \log n)$ evaluations of hash functions from $\mathcal{H}$ and $O(n^{\rho})$ hash table lookups and distance computations where $\rho \in (0,1)$ is determined by the locality-sensitivity properties of $\mathcal{H}$. It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to $O(\log^2 n)$, leaving the query time to be dominated by $O(n^{\rho})$ distance computations and $O(n^{\rho} \log n)$ additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework, making it a viable replacement in practice. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to $O(n^\rho)$.

read more

Citations
More filters
Posted Content

PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors

TL;DR: PUFFINN as mentioned in this paper is a parameterless LSH-based index for solving the $k$-nearest neighbor problem with probabilistic guarantees, which combines several heuristic ideas known in the literature.
Book ChapterDOI

FRESH: Fréchet Similarity with Hashing

TL;DR: This paper proposes FRESH, an approximate and randomized approach for r-range search that leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve, and on a subsequent pruning step based on a cascade of curve simplifications.
Book ChapterDOI

Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors

TL;DR: This work presents a novel index structure called radius-optimized Locality Sensitive Hashing (roLSH), and extensive experimental analysis on real datasets shows the performance benefit of roLSH over existing state-of-the-art LSH techniques.
Journal ArticleDOI

FJLT-FLSH: More Efficient Fly Locality-Sensitive Hashing Algorithm via FJLT for WMSN IoT Search

TL;DR: The experimental results show that the proposed algorithm has better generalization, accuracy of the search results, and time efficiency when using the Drosophila olfactory nerve to simulate the LSH process.
Journal ArticleDOI

A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

TL;DR: A log-based anomaly detection method with efficient selection of neighbors and automatic selection of k neighbors and an automatic method based on the Silhouette Coefficient, which can select proper k neighbors to improve the accuracy of anomaly detection are proposed.
References
More filters
Book ChapterDOI

Probability Inequalities for sums of Bounded Random Variables

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Proceedings ArticleDOI

Approximate nearest neighbors: towards removing the curse of dimensionality

TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.
Proceedings ArticleDOI

Locality-sensitive hashing scheme based on p-stable distributions

TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.
Journal ArticleDOI

Universal classes of hash functions

TL;DR: An input independent average linear time algorithm for storage and retrieval on keys that makes a random choice of hash function from a suitable class of hash functions.
Proceedings Article

Spectral Hashing

TL;DR: The problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard and a spectral method is obtained whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian.
Related Papers (5)