Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

[...]

Narayanan Sundaram¹, Aizana Z. Turmukhametova², Nadathur Satish¹, Todd Mostak², Piotr Indyk², Samuel Madden², Pradeep Dubey¹ - Show less +3 more•Institutions (2)

Intel¹, Massachusetts Institute of Technology²

01 Sep 2013

TL;DR: Parallel LSH (PLSH) as mentioned in this paper is a variant of LSH designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports high-throughput streaming of new data.

...read moreread less

Abstract: Finding nearest neighbors has become an important operation on databases, with applications to text search, multimedia indexing, and many other areas. One popular algorithm for similarity search, especially for high dimensional data (where spatial indexes like kd-trees do not perform well) is Locality Sensitive Hashing (LSH), an approximation algorithm for finding similar objects.In this paper, we describe a new variant of LSH, called Parallel LSH (PLSH) designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports high-throughput streaming of new data. Our approach employs several novel ideas, including: cache-conscious hash table layout, using a 2-level merge algorithm for hash table construction; an efficient algorithm for duplicate elimination during hash-table querying; an insert-optimized hash table structure and efficient data expiration algorithm for streaming data; and a performance model that accurately estimates performance of the algorithm and can be used to optimize parameter settings. We show that on a workload where we perform similarity search on a dataset of > 1 Billion tweets, with hundreds of millions of new tweets per day, we can achieve query times of 1-2.5 ms. We show that this is an order of magnitude faster than existing indexing schemes, such as inverted indexes. To the best of our knowledge, this is the fastest implementation of LSH, with table construction times up to 3.7× faster and query times that are 8.3× faster than a basic implementation.

...read moreread less

17 citations

Proceedings Article•

Boosting Complementary Hash Tables for Fast Nearest Neighbor Search

[...]

Xianglong Liu¹, Cheng Deng², Yadong Mu³, Zhujin Li¹•Institutions (3)

Beihang University¹, Xidian University², Peking University³

12 Feb 2017

TL;DR: Extensive experiments carried out on two popular tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed boosted complementary hash-tables method enjoys the strong table complementarity and significantly outperforms the state-of-the-arts.

...read moreread less

Abstract: Hashing has been proven a promising technique for fast nearest neighbor search over massive databases. In many practical tasks it usually builds multiple hash tables for a desired level of recall performance. However, existing multi-table hashing methods suffer from the heavy table redundancy, without strong table complementarity and effective hash code learning. To address the problem, this paper proposes a multi-table learning method which pursues a specified number of complementary and informative hash tables from a perspective of ensemble learning. By regarding each hash table as a neighbor prediction model, the multi-table search procedure boils down to a linear assembly of predictions stemming from multiple tables. Therefore, a sequential updating and learning framework is naturally established in a boosting mechanism, theoretically guaranteeing the table complementarity and algorithmic convergence. Furthermore, each boosting round pursues the discriminative hash functions for each table by a discrete optimization in the binary code space. Extensive experiments carried out on two popular tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed boosted complementary hash-tables method enjoys the strong table complementarity and significantly outperforms the state-of-the-arts.

...read moreread less

17 citations

Book Chapter•DOI•

Hashing Forests for Morphological Search and Retrieval in Neuroscientific Image Databases

[...]

Sepideh Mesbah¹, Sepideh Mesbah², Sailesh Conjeti¹, Ajayrama Kumaraswamy³, Philipp L. Rautenberg³, Philipp L. Rautenberg², Nassir Navab⁴, Nassir Navab¹, Amin Katouzian¹ - Show less +5 more•Institutions (4)

Technische Universität München¹, Max Planck Society², Ludwig Maximilian University of Munich³, Johns Hopkins University⁴

05 Oct 2015

TL;DR: Experimental validations show the superiority of the proposed technique over the state-of-the art methods, in terms of precision-recall trade off for a particular code size, which demonstrates the potential of this approach for effective morphology preserving encoding and retrieval in large neuron databases.

...read moreread less

Abstract: In this paper, for the first time, we propose a data-driven search and retrieval (hashing) technique for large neuron image databases. The presented method is established upon hashing forests, where multiple unsupervised random trees are used to encode neurons by parsing the neuromorphological feature space into balanced subspaces. We introduce an inverse coding formulation for retrieval of relevant neurons to effectively mitigate the need for pairwise comparisons across the database. Experimental validations show the superiority of our proposed technique over the state-of-the art methods, in terms of precision-recall trade off for a particular code size. This demonstrates the potential of this approach for effective morphology preserving encoding and retrieval in large neuron databases.

...read moreread less

17 citations

Journal Article•DOI•

Semi-supervised constraints preserving hashing

[...]

Di Wang¹, Xinbo Gao¹, Xiumei Wang¹•Institutions (1)

Xidian University¹

01 Nov 2015-Neurocomputing

TL;DR: A novel semi-supervised hashing method which preserves pairwise constraints for both low-dimensional embeddings and binary codes and can fully preserve pairwise semantic similarities for binary codes thus leading to better retrieval performance.

...read moreread less

17 citations

Posted Content•

Leveraging Sparsity for Efficient Submodular Data Summarization

[...]

Erik M. Lindgren¹, Shanshan Wu², Alexandros G. Dimakis²•Institutions (2)

Boston University¹, University of Texas at Austin²

08 Mar 2017-arXiv: Machine Learning

TL;DR: This paper analyzes a different method of sparsification that is a better model for methods such as Locality Sensitive Hashing to accelerate the nearest neighbor computations and extend the use of the facility location problem to a broader family of similarities.

...read moreread less

Abstract: The facility location problem is widely used for summarizing large datasets and has additional applications in sensor placement, image retrieval, and clustering. One difficulty of this problem is that submodular optimization algorithms require the calculation of pairwise benefits for all items in the dataset. This is infeasible for large problems, so recent work proposed to only calculate nearest neighbor benefits. One limitation is that several strong assumptions were invoked to obtain provable approximation guarantees. In this paper we establish that these extra assumptions are not necessary---solving the sparsified problem will be almost optimal under the standard assumptions of the problem. We then analyze a different method of sparsification that is a better model for methods such as Locality Sensitive Hashing to accelerate the nearest neighbor computations and extend the use of the problem to a broader family of similarities. We validate our approach by demonstrating that it rapidly generates interpretable summaries.

...read moreread less

17 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics