scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
01 Sep 2013
TL;DR: Parallel LSH (PLSH) as mentioned in this paper is a variant of LSH designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports high-throughput streaming of new data.
Abstract: Finding nearest neighbors has become an important operation on databases, with applications to text search, multimedia indexing, and many other areas. One popular algorithm for similarity search, especially for high dimensional data (where spatial indexes like kd-trees do not perform well) is Locality Sensitive Hashing (LSH), an approximation algorithm for finding similar objects.In this paper, we describe a new variant of LSH, called Parallel LSH (PLSH) designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports high-throughput streaming of new data. Our approach employs several novel ideas, including: cache-conscious hash table layout, using a 2-level merge algorithm for hash table construction; an efficient algorithm for duplicate elimination during hash-table querying; an insert-optimized hash table structure and efficient data expiration algorithm for streaming data; and a performance model that accurately estimates performance of the algorithm and can be used to optimize parameter settings. We show that on a workload where we perform similarity search on a dataset of > 1 Billion tweets, with hundreds of millions of new tweets per day, we can achieve query times of 1-2.5 ms. We show that this is an order of magnitude faster than existing indexing schemes, such as inverted indexes. To the best of our knowledge, this is the fastest implementation of LSH, with table construction times up to 3.7× faster and query times that are 8.3× faster than a basic implementation.

17 citations

Proceedings Article
12 Feb 2017
TL;DR: Extensive experiments carried out on two popular tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed boosted complementary hash-tables method enjoys the strong table complementarity and significantly outperforms the state-of-the-arts.
Abstract: Hashing has been proven a promising technique for fast nearest neighbor search over massive databases. In many practical tasks it usually builds multiple hash tables for a desired level of recall performance. However, existing multi-table hashing methods suffer from the heavy table redundancy, without strong table complementarity and effective hash code learning. To address the problem, this paper proposes a multi-table learning method which pursues a specified number of complementary and informative hash tables from a perspective of ensemble learning. By regarding each hash table as a neighbor prediction model, the multi-table search procedure boils down to a linear assembly of predictions stemming from multiple tables. Therefore, a sequential updating and learning framework is naturally established in a boosting mechanism, theoretically guaranteeing the table complementarity and algorithmic convergence. Furthermore, each boosting round pursues the discriminative hash functions for each table by a discrete optimization in the binary code space. Extensive experiments carried out on two popular tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed boosted complementary hash-tables method enjoys the strong table complementarity and significantly outperforms the state-of-the-arts.

17 citations

Book ChapterDOI
05 Oct 2015
TL;DR: Experimental validations show the superiority of the proposed technique over the state-of-the art methods, in terms of precision-recall trade off for a particular code size, which demonstrates the potential of this approach for effective morphology preserving encoding and retrieval in large neuron databases.
Abstract: In this paper, for the first time, we propose a data-driven search and retrieval (hashing) technique for large neuron image databases. The presented method is established upon hashing forests, where multiple unsupervised random trees are used to encode neurons by parsing the neuromorphological feature space into balanced subspaces. We introduce an inverse coding formulation for retrieval of relevant neurons to effectively mitigate the need for pairwise comparisons across the database. Experimental validations show the superiority of our proposed technique over the state-of-the art methods, in terms of precision-recall trade off for a particular code size. This demonstrates the potential of this approach for effective morphology preserving encoding and retrieval in large neuron databases.

17 citations

Journal ArticleDOI
TL;DR: A novel semi-supervised hashing method which preserves pairwise constraints for both low-dimensional embeddings and binary codes and can fully preserve pairwise semantic similarities for binary codes thus leading to better retrieval performance.

17 citations

Posted Content
TL;DR: This paper analyzes a different method of sparsification that is a better model for methods such as Locality Sensitive Hashing to accelerate the nearest neighbor computations and extend the use of the facility location problem to a broader family of similarities.
Abstract: The facility location problem is widely used for summarizing large datasets and has additional applications in sensor placement, image retrieval, and clustering. One difficulty of this problem is that submodular optimization algorithms require the calculation of pairwise benefits for all items in the dataset. This is infeasible for large problems, so recent work proposed to only calculate nearest neighbor benefits. One limitation is that several strong assumptions were invoked to obtain provable approximation guarantees. In this paper we establish that these extra assumptions are not necessary---solving the sparsified problem will be almost optimal under the standard assumptions of the problem. We then analyze a different method of sparsification that is a better model for methods such as Locality Sensitive Hashing to accelerate the nearest neighbor computations and extend the use of the problem to a broader family of similarities. We validate our approach by demonstrating that it rapidly generates interpretable summaries.

17 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139