scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings ArticleDOI
24 Jul 2016
TL;DR: Two further algorithmic improvements are introduced: a normal scale (NS) choice of the optimal number of nearest neighbours, and locality sensitive hashing (LSH) to approximate nearest neighbour searches to offer the potential for an efficient method for Big Data Clustering.
Abstract: We introduce an efficient distributed implementation of nearest neighbour mean shift clustering (NNMS). The computationally intensive nature of NNMS has so far restricted its application to complex data sets where a flexible clustering with non-ellipsoidal clusters would be beneficial. A parallel implementation of the standard serial NNMS algorithm on its own brings insufficient performance gains so we introduce two further algorithmic improvements: a normal scale (NS) choice of the optimal number of nearest neighbours, and locality sensitive hashing (LSH) to approximate nearest neighbour searches. Combining these improvements into a single distributed algorithm DNNMS offers the potential for an efficient method for Big Data Clustering.

7 citations

Book ChapterDOI
02 Jun 2009
TL;DR: This paper defines a new index structure MLR-Index (Multi-Layer Ring-based Index) in a metric space and proposes time and space efficient algorithms with high accuracy for an approximate nearest neighbor search for two different types of queries.
Abstract: High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality , it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over high-dimensional data space which is inferior to a brute-force approach linear scan . In this paper, we focus on an approximate nearest neighbor search for two different types of queries: r-Range search and k-NN search . Adapting a novel concept of a ring structure, we define a new index structure MLR-Index (Multi-Layer Ring-based Index) in a metric space and propose time and space efficient algorithms with high accuracy. Evaluations through comprehensive experiments comparing with the best-known high-dimensional indexing method LSH show that our approach is faster for a similar accuracy, and shows higher accuracy for a similar response time than LSH .

7 citations

Proceedings ArticleDOI
09 Jul 2012
TL;DR: A novel retrieval method called note-based locality sensitive hashing (NLSH) is presented and it is combined with pitch- based localitysensitive hashing (PLSH) to screen candidate fragments to improve query by humming quality.
Abstract: Query by humming (QBH) is a technique that is used for content-based music information retrieval. It is a challenging unsolved problem due to humming errors. In this paper a novel retrieval method called note-based locality sensitive hashing (NLSH) is presented and it is combined with pitch-based locality sensitive hashing (PLSH) to screen candidate fragments. The method extracts PLSH and NLSH vectors from the database to construct two indexes. In the phase of retrieval, it automatically extracts vectors similar to the index construction and searches the indexes to obtain a list of candidates. Then recursive alignment (RA) is executed on these surviving candidates. Experiments are conducted on a database of 5,000 MIDI files with the 2010 MIREX-QBH query corpus. The results show by using the combination approach the relatively improvements of mean reciprocal rank are 29.7% (humming from anywhere) and 23.8% (humming from beginning), respectively, compared with the current state-of-the-art method.

7 citations

Proceedings ArticleDOI
Smita Wadhwa1, Pawan Gupta1
09 Jan 2010
TL;DR: Distributed LSH (D-LSH) performs better for finding approximate near neighbors on extremely large scales, as DLSH distributes close points on single boxes, and far points on different boxes based on projections.
Abstract: In this paper, we present DLSH Distributed Locality Sensitive Hashing, a similar-data search technology. The huge growth in the size of video content has broken the traditional multi-media index hosting and look-up solutions, these are not able to scale to the size of the current and projected index requirements. Distributed LSH (D-LSH) addresses this need of a highly scalable multi-media index. DLSH performs better for finding approximate near neighbors on extremely large scales, as DLSH distributes close points on single boxes, and far points on different boxes based on projections.

7 citations

Proceedings ArticleDOI
26 Aug 2008
TL;DR: This paper first uses examplar near-duplicates to learn an effective distance measure and incorporate the learned metric into locality-sensitive hashing to achieve fast retrieval, and then uses exam plarNearduplicate images to automatically expand the query to further improve the retrieval accuracy.
Abstract: In this paper, we propose a novel scheme for near-duplicate image detection, which is an important problem in variety of applications. While in general content based image retrieval, an image could be similar to the query image in infinitely various ways, the ways in which near-duplicate images deviate from the reference image are very limited. Based on this observation, we proposed to use examplar near-duplicate images, which can be obtained automatically, to improve the performance of near-duplicate image retrieval. We first use examplar near-duplicates to learn an effective distance measure and incorporate the learned metric into locality-sensitive hashing to achieve fast retrieval. We then use examplar near-duplicates to automatically expand the query to further improve the retrieval accuracy. The experimental results validate the effectiveness of the proposed algorithms.

7 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139