Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

Open AccessPosted Content

Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

- 25 Aug 2017 -

TLDR

The Indyk-Motwani locality sensitive hashing (LSH) framework as mentioned in this paper is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution of hash functions over locality sensitive hash functions that partition space.

Abstract:

The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution $\mathcal{H}$ over locality-sensitive hash functions that partition space. For a collection of $n$ points, after preprocessing, the query time is dominated by $O(n^{\rho} \log n)$ evaluations of hash functions from $\mathcal{H}$ and $O(n^{\rho})$ hash table lookups and distance computations where $\rho \in (0,1)$ is determined by the locality-sensitivity properties of $\mathcal{H}$. It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to $O(\log^2 n)$, leaving the query time to be dominated by $O(n^{\rho})$ distance computations and $O(n^{\rho} \log n)$ additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework, making it a viable replacement in practice. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to $O(n^\rho)$.

Citations

PDF

Open Access

More filters

Posted Content

PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors

Martin Aumüller, +3 more

- 28 Jun 2019 -

arXiv: Data Structures and Algorithms

TL;DR: PUFFINN as mentioned in this paper is a parameterless LSH-based index for solving the $k$-nearest neighbor problem with probabilistic guarantees, which combines several heuristic ideas known in the literature.

...read moreread less

Book ChapterDOI

FRESH: Fréchet Similarity with Hashing

Matteo Ceccarello, +2 more

TL;DR: This paper proposes FRESH, an approximate and randomized approach for r-range search that leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve, and on a subsequent pruning step based on a cascade of curve simplifications.

...read moreread less

Book ChapterDOI

Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors

Omid Jafari, +2 more

TL;DR: This work presents a novel index structure called radius-optimized Locality Sensitive Hashing (roLSH), and extensive experimental analysis on real datasets shows the performance benefit of roLSH over existing state-of-the-art LSH techniques.

...read moreread less

Journal ArticleDOI

FJLT-FLSH: More Efficient Fly Locality-Sensitive Hashing Algorithm via FJLT for WMSN IoT Search

Wenhao Shao, +4 more

- 01 May 2019 -

IEEE Internet of Things Journal

TL;DR: The experimental results show that the proposed algorithm has better generalization, accuracy of the search results, and time efficiency when using the Drosophila olfactory nerve to simulate the LSH process.

...read moreread less

Journal ArticleDOI

A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

Bingming Wang, +2 more

- 02 Jun 2020 -

Scientific Programming

TL;DR: A log-based anomaly detection method with efficient selection of neighbors and automatic selection of k neighbors and an automatic method based on the Silhouette Coefficient, which can select proper k neighbors to improve the accuracy of anomaly detection are proposed.

...read moreread less

References

PDF

Open Access

More filters

Book ChapterDOI

Probability Inequalities for sums of Bounded Random Variables

Wassily Hoeffding

- 01 Mar 1963 -

Journal of the American Statistical Asso...

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.

...read moreread less

Proceedings ArticleDOI

Approximate nearest neighbors: towards removing the curse of dimensionality

Piotr Indyk, +1 more

TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.

...read moreread less

Proceedings ArticleDOI

Locality-sensitive hashing scheme based on p-stable distributions

Mayur Datar, +3 more

TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.

...read moreread less

Journal ArticleDOI

Universal classes of hash functions

J. Lawrence Carter, +1 more

- 01 Apr 1979 -

Journal of Computer and System Sciences

TL;DR: An input independent average linear time algorithm for storage and retrieval on keys that makes a random choice of hash function from a suitable class of hash functions.

...read moreread less

Proceedings Article

Spectral Hashing

Yair Weiss, +2 more

TL;DR: The problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard and a spectral method is obtained whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian.

...read moreread less

Collapse

Theory of Computing Systems \/ Mathemati...

Searchable Encryption with Optimal Locality: Achieving Sublogarithmic Read Efficiency.

Ioannis Demertzis, +2 more

- 01 Jan 2017 -

IACR Cryptology ePrint Archive

Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

Citations

PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors

FRESH: Fréchet Similarity with Hashing

Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors

FJLT-FLSH: More Efficient Fly Locality-Sensitive Hashing Algorithm via FJLT for WMSN IoT Search

A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

References

Probability Inequalities for sums of Bounded Random Variables

Approximate nearest neighbors: towards removing the curse of dimensionality

Locality-sensitive hashing scheme based on p-stable distributions

Universal classes of hash functions

Spectral Hashing

Related Papers (5)

Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

On aspects of university and performance for closed hashing

String hashing for linear probing

Space Efficient Hash Tables With Worst Case Constant Access Time

Searchable Encryption with Optimal Locality: Achieving Sublogarithmic Read Efficiency.