Open AccessPosted Content
Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search
TLDR
The Indyk-Motwani locality sensitive hashing (LSH) framework as mentioned in this paper is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution of hash functions over locality sensitive hash functions that partition space.Abstract:
The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution $\mathcal{H}$ over locality-sensitive hash functions that partition space. For a collection of $n$ points, after preprocessing, the query time is dominated by $O(n^{\rho} \log n)$ evaluations of hash functions from $\mathcal{H}$ and $O(n^{\rho})$ hash table lookups and distance computations where $\rho \in (0,1)$ is determined by the locality-sensitivity properties of $\mathcal{H}$. It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to $O(\log^2 n)$, leaving the query time to be dominated by $O(n^{\rho})$ distance computations and $O(n^{\rho} \log n)$ additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework, making it a viable replacement in practice. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to $O(n^\rho)$.read more
Citations
More filters
Posted Content
PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors
TL;DR: PUFFINN as mentioned in this paper is a parameterless LSH-based index for solving the $k$-nearest neighbor problem with probabilistic guarantees, which combines several heuristic ideas known in the literature.
Book ChapterDOI
FRESH: Fréchet Similarity with Hashing
TL;DR: This paper proposes FRESH, an approximate and randomized approach for r-range search that leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve, and on a subsequent pruning step based on a cascade of curve simplifications.
Book ChapterDOI
Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors
TL;DR: This work presents a novel index structure called radius-optimized Locality Sensitive Hashing (roLSH), and extensive experimental analysis on real datasets shows the performance benefit of roLSH over existing state-of-the-art LSH techniques.
Journal ArticleDOI
FJLT-FLSH: More Efficient Fly Locality-Sensitive Hashing Algorithm via FJLT for WMSN IoT Search
TL;DR: The experimental results show that the proposed algorithm has better generalization, accuracy of the search results, and time efficiency when using the Drosophila olfactory nerve to simulate the LSH process.
Journal ArticleDOI
A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection
Bingming Wang,Shi Ying,Zhe Yang +2 more
TL;DR: A log-based anomaly detection method with efficient selection of neighbors and automatic selection of k neighbors and an automatic method based on the Silhouette Coefficient, which can select proper k neighbors to improve the accuracy of anomaly detection are proposed.
References
More filters
Book ChapterDOI
Probability Inequalities for sums of Bounded Random Variables
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Proceedings ArticleDOI
Approximate nearest neighbors: towards removing the curse of dimensionality
Piotr Indyk,Rajeev Motwani +1 more
TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.
Proceedings ArticleDOI
Locality-sensitive hashing scheme based on p-stable distributions
TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.
Journal ArticleDOI
Universal classes of hash functions
TL;DR: An input independent average linear time algorithm for storage and retrieval on keys that makes a random choice of hash function from a suitable class of hash functions.
Proceedings Article
Spectral Hashing
TL;DR: The problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard and a spectral method is obtained whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian.