scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings ArticleDOI
26 Jul 2009
TL;DR: A novel shape descriptor based on shape context, which in combination with hierarchical distance based hashing is used for word and graphical pattern based document image indexing and retrieval and the applicability is demonstrated for classification of characters and symbols.
Abstract: In this paper we present a novel shape descriptor based on shape context, which in combination with hierarchical distance based hashing is used for word and graphical pattern based document image indexing and retrieval. The shape descriptor represents the relative arrangement of points sampled on the boundary of the shape of object. We also demonstrate the applicability of the novel shape descriptor for classification of characters and symbols. For indexing, we provide anew formulation for distance based hierarchical locality sensitive hashing. Experiments have yielded promising results.

16 citations

Proceedings Article
01 Jan 2012
TL;DR: An efficient and accurate metagenome clustering approach that uses the locality sensitive hashing (LSH) technique to approximate the computational complexity associated with comparing sequences, and introduces the use of fixed-length, gapless subsequences for improving the sensitivity of the LSH-based similarity function.
Abstract: The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes of available sequence data from such microbial communities (metagenomes). In this paper, we developed an efficient and accurate metagenome clustering approach that uses the locality sensitive hashing (LSH) technique to approximate the computational complexity associated with comparing sequences. We introduce the use of fixed-length, gapless subsequences for improving the sensitivity of the LSH-based similarity function. We evaluate the performance of our algorithm on two metagenome datasets associated with microbes existing across different human skin locations. Our empirical results show the strength of the developed approach in comparison to three state-of-the-art sequence clustering algorithms with regards to computational efficiency and clustering quality. We also demonstrate practical significance for the developed clustering algorithm, to compare bacterial diversity and structure across different skin locations.

16 citations

Proceedings ArticleDOI
14 Mar 2010
TL;DR: The problem of generating compact signatures as a rate-distortion problem is formulated and the aim is at minimizing the reconstruction error on the squared distances with a constraint on the memory usage.
Abstract: Handling large amounts of data, such as large image databases, requires the use of approximate nearest neighbor search techniques. Recently, Hamming embedding methods such as spectral hashing have addressed the problem of obtaining compact binary codes optimizing the trade-off between the memory usage and the probability of retrieving the true nearest neighbors. In this paper, we formulate the problem of generating compact signatures as a rate-distortion problem. In the spirit of source coding algorithms, we aim at minimizing the reconstruction error on the squared distances with a constraint on the memory usage. The vectors are ranked based on the distance estimates to the query vector. Experiments on image descriptors show a significant improvement over spectral hashing.

16 citations

Proceedings ArticleDOI
18 Apr 2001
TL;DR: This work proposes an index structure, the ANN-tree (approximate nearest neighbor tree), which is demonstrably more efficient than existing structures like the R*-tree and is a preferable index structure for both exact and approximate nearest neighbor searches.
Abstract: We explore the problem of approximate nearest neighbor searches. We propose an index structure, the ANN-tree (approximate nearest neighbor tree) to solve this problem. The ANN-tree supports high accuracy nearest neighbor search. The actual nearest neighbor of a query point can usually be found in the first leaf page accessed. The accuracy increases to near 100% if a second page is accessed. This is not achievable via traditional indexes. Even if an exact nearest neighbor query is desired, the ANN-tree is demonstrably more efficient than existing structures like the R*-tree. This makes the ANN-tree a preferable index structure for both exact and approximate nearest neighbor searches. We present the index in detail and provide experimental results on both real and synthetic data sets.

16 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139