scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings ArticleDOI
10 Jan 2015
TL;DR: Two methods for reducing the number of item pairs comparisons, through simple clustering, where similar items tend to be in the same cluster are proposed, and one that uses Locality Sensitive Hashing (LSH) and another that uses the item consumption cardinality.
Abstract: Item-based Collaborative Filtering (CF) models offer good recommendations with low latency. Still, constructing such models is often slow, requiring the comparison of all item pairs, and then caching for each item the list of most similar items. In this paper we suggest methods for reducing the number of item pairs comparisons, through simple clustering, where similar items tend to be in the same cluster. We propose two methods, one that uses Locality Sensitive Hashing (LSH), and another that uses the item consumption cardinality. We evaluate the two methods demonstrating the cardinality based method reduce the computation time dramatically without damage the accuracy.

9 citations

Proceedings ArticleDOI
01 Nov 2015
TL;DR: Realized experiments on the challenging Digital Database for Screening Mammography (DDSM) dataset proved the performance of the proposed CBIR method for the retrieval of the most relevant mammograms in a large-scale dataset.
Abstract: Content-based image retrieval (CBIR) is a primordial task to provide the most similar images especially in the context of medical imaging for diagnosis aid. In this paper, we propose a CBIR method for a large-scale mammogram datasets. In fact, to extract region of interest (ROI) signatures, four moment descriptors were defined after computing the curvelet coefficients for each level of the ROI. Then, an unsupervised technique based on locality sensitive hashing was adopted for indexing the extracted signatures. The main contribution of the suggested method resides in the variance-based filtering within the retrieval phase in order to extract the suitable buckets in the shortest time, while optimizing the memory requirement. After that, an accurate searching in Hamming space is performed in order to identify the similar ROIs to the query case. Realized experiments on the challenging Digital Database for Screening Mammography (DDSM) dataset proved the performance of the proposed method for the retrieval of the most relevant mammograms in a large-scale dataset. It achieves a mean retrieval precision rate of 97.1% over a total of 11218 mammogram ROIs.

9 citations

Book ChapterDOI
30 Sep 2020
TL;DR: This work presents a novel index structure called radius-optimized Locality Sensitive Hashing (roLSH), and extensive experimental analysis on real datasets shows the performance benefit of roLSH over existing state-of-the-art LSH techniques.
Abstract: Similarity search in high-dimensional spaces is an important task for many multimedia applications. Due to the notorious curse of dimensionality, approximate nearest neighbor techniques are preferred over exact searching techniques since they can return good enough results at a much better speed. Locality Sensitive Hashing (LSH) is a very popular random hashing technique for finding approximate nearest neighbors. Existing state-of-the-art Locality Sensitive Hashing techniques that focus on improving performance of the overall process, mainly focus on minimizing the total number of IOs while sacrificing the overall processing time. The main time-consuming process in LSH techniques is the process of finding neighboring points in projected spaces. We present a novel index structure called radius-optimized Locality Sensitive Hashing (roLSH). With the help of sampling techniques and Neural Networks, we present two techniques to find neighboring points in projected spaces efficiently, without sacrificing the accuracy of the results. Our extensive experimental analysis on real datasets shows the performance benefit of roLSH over existing state-of-the-art LSH techniques.

9 citations

Proceedings ArticleDOI
09 Jun 2021
TL;DR: BiDens as mentioned in this paper proposes a novel densification method, i.e., BiDens, which is more efficient to fill a sketch's empty bins with values of its non-empty bins in either the forward or backward directions.
Abstract: As an efficient tool for approximate similarity computation and search, Locality Sensitive Hashing (LSH) has been widely used in many research areas including databases, data mining, information retrieval, and machine learning. Classical LSH methods typically require to perform hundreds or even thousands of hashing operations when computing the LSH sketch for each input item (e.g., a set or a vector); however, this complexity is still too expensive and even impractical for applications requiring processing data in real-time. To address this issue, several fast methods such as OPH and BCWS have been proposed to efficiently compute the LSH sketches; however, these methods may generate many sketches with empty bins, which may introduce large errors for similarity estimation and also limit their usage for fast similarity search. To solve this issue, we propose a novel densification method, i.e., BiDens. Compared with existing densification methods, our BiDens is more efficient to fill a sketch's empty bins with values of its non-empty bins in either the forward or backward directions. Furthermore, it also densifies empty bins to satisfy the densification principle (i.e., the LSH property). Theoretical analysis and experimental results on similarity estimation, fast similarity search, and kernel linearization using real-world datasets demonstrate that our BiDens is up to 106 times faster than state-of-the-art methods while achieving the same or even better accuracy.

9 citations

Patent
28 Feb 2013
TL;DR: In this paper, an approximate nearest neighbor search device is applied which comprises: a database storage unit which, when a plurality of points which are represented with vector data is inputted, computes a hash index by applying a hash function to each point, and stores each point in a multi-dimensional hash table by projecting each point into a multidimensional space which is segmented into a plurality by the multi dimensional hash table bins; a search range establishment unit which is used to establish a location of the query within the space, establishes estimate values of the distance from the query to
Abstract: An objective of the present invention is to implement an approximate nearest neighbor search rapidly and with high precision in searching by appropriately reducing the number of nearest neighbor candidates. An approximate nearest neighbor search device is applied which comprises: a database storage unit which, when a plurality of points which are represented with vector data is inputted, computes a hash index by applying a hash function to each point, and stores each point in a multi-dimensional hash table by projecting each point in a multi-dimensional space which is segmented into a plurality of regions by the multi-dimensional hash table bins; a search range establishment unit which, when a query is inputted, applies the hash function to the query, establishes a location of the query within the space, establishes estimate values of the distance from the query to each region within the space, and establishes regions to be searched on the basis of the estimate values; and a nearest neighbor establishment unit which calculates the distance from each point within the search region to the query, and computes the nearest point to the query to be the nearest neighbor to the query. The search range establishment unit refers to the index of each region and derives a representative point of the region, establishes the estimate value on the basis of the distance between the query and each representative point, applies a branch and bound technique, excluding the regions which cannot be the regions to be searched, and establishes the regions to be searched.

9 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139