scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings ArticleDOI
10 Jan 2016
TL;DR: A new construction of locality-sensitive hash functions for Hamming space that is covering in the sense that is it guaranteed to produce a collision for every pair of vectors within a given radius r, which can avoid the problem of false negatives at little or no cost in efficiency.
Abstract: We consider a new construction of locality-sensitive hash functions for Hamming space that is covering in the sense that is it guaranteed to produce a collision for every pair of vectors within a given radius r. The construction is efficient in the sense that the expected number of hash collisions between vectors at distance cr, for a given c > 1, comes close to that of the best possible data independent LSH without the covering guarantee, namely, the seminal LSH construction of Indyk and Motwani (FOCS '98). The efficiency of the new construction essentially matches their bound if cr = log(n)/k, where n is the number of points in the data set and k ∈ N, and differs from it by at most a factor ln(4) in the exponent for general values of cr. As a consequence, LSH-based similarity search in Hamming space can avoid the problem of false negatives at little or no cost in efficiency.

44 citations

Proceedings ArticleDOI
13 Oct 2015
TL;DR: A semi-supervised deep learning hashing (DLH) method for fast multimedia retrieval that utilizes both visual and label information to learn an relative similarity graph that can more precisely reflect the relationship among training data, and then generates the hash codes based on the graph.
Abstract: Learning-based hashing methods are becoming the mainstream for approximate scalable multimedia retrieval. They consist of two main components: hash codes learning for training data and hash functions learning for new data points. Tremendous efforts have been devoted to designing novel methods for these two components, i.e., supervised and unsupervised methods for learning hash codes, and different models for inferring hashing functions. However, there is little work integrating supervised and unsupervised hash codes learning into a single framework. Moreover, the hash function learning component is usually based on hand-crafted visual features extracted from the training images. The performance of a content-based image retrieval system crucially depends on the feature representation and such hand-crafted visual features may degrade the accuracy of the hash functions. In this paper, we propose a semi-supervised deep learning hashing (DLH) method for fast multimedia retrieval. More specifically, in the first component, we utilize both visual and label information to learn an relative similarity graph that can more precisely reflect the relationship among training data, and then generate the hash codes based on the graph. In the second stage, we apply a deep convolutional neural network (CNN) to simultaneously learn a good multimedia representation and hash functions. Extensive experiments on three popular datasets demonstrate the superiority of our DLH over both supervised and unsupervised hashing methods.

44 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel hashing method, referred to as topology preserving hashing (TPH), which is distinct from prior works by also preserving the neighborhood ranking and is capable of mining semantic relationship between unlabeled data without supervised information.
Abstract: Hashing-based similarity search techniques is becoming increasingly popular in large data sets To capture meaningful neighbors, the topology of a data set, which represents the neighborhood relationships between its subregions and the relative proximities between the neighbors of each subregion, eg, the relative neighborhood ranking of each subregion, should be exploited However, most existing hashing methods are developed to preserve neighborhood relationships while ignoring the relative neighborhood proximities Moreover, most hashing methods lack in providing a good result ranking, since there are often lots of results sharing the same Hamming distance to a query In this paper, we propose a novel hashing method to solve these two issues jointly The proposed method is referred to as topology preserving hashing (TPH) TPH is distinct from prior works by also preserving the neighborhood ranking Based on this framework, we present three different TPH methods, including linear unsupervised TPH, semisupervised TPH, and kernelized TPH Particularly, our unsupervised TPH is capable of mining semantic relationship between unlabeled data without supervised information Extensive experiments on four large data sets demonstrate the superior performances of the proposed methods over several state-of-the-art unsupervised and semisupervised hashing techniques

44 citations

Journal ArticleDOI
TL;DR: This paper proposes a new hashing scheme using two hash codes with different lengths for queries and stored images, i.e., the asymmetric cyclical hashing, which is used to reduce the storage requirement and yield a better precision rate of retrieved images.
Abstract: This paper addresses a problem in the hashing technique for large scale image retrieval: learn a compact hash code to reduce the storage cost with performance comparable to that of the long hash code. A longer hash code yields a better precision rate of retrieved images. However, it also requires a larger storage, which limits the number of stored images. Current hashing methods employ the same code length for both queries and stored images. We propose a new hashing scheme using two hash codes with different lengths for queries and stored images, i.e., the asymmetric cyclical hashing. A compact hash code is used to reduce the storage requirement, while a long hash code is used for the query image. The image retrieval is performed by computing the Hamming distance of the long hash code of the query and the cyclically concatenated compact hash code of the stored image to yield a high precision and recall rate. Experiments on benchmarking databases consisting up to one million images show the effectiveness of the proposed method.

43 citations

Proceedings ArticleDOI
19 Jun 2017
TL;DR: In this article, the authors considered the problem of approximate set similarity search under Braun-Blanquet similarity B(x, y) = |x ∩ y| / max(|x|, |y|) and presented a simple data structure that solves this problem with space usage O(n1+ρlogn + ∑x e P|x) where n = |P| and ρ = log( 1/b1)/log(1/b2).
Abstract: We consider the problem of approximate set similarity search under Braun-Blanquet similarity B(x, y) = |x ∩ y| / max(|x|, |y|). The (b1, b2)-approximate Braun-Blanquet similarity search problem is to preprocess a collection of sets P such that, given a query set q, if there exists x E P with B(q, x) ≥ b1, then we can efficiently return x′ E P with B(q, x′) > b2. We present a simple data structure that solves this problem with space usage O(n1+ρlogn + ∑x e P|x|) and query time O(|q|nρ logn) where n = |P| and ρ = log(1/b1)/log(1/b2). Making use of existing lower bounds for locality-sensitive hashing by O'Donnell et al. (TOCT 2014) we show that this value of ρ is tight across the parameter space, i.e., for every choice of constants 0 b2 b1 In the case where all sets have the same size our solution strictly improves upon the value of ρ that can be obtained through the use of state-of-the-art data-independent techniques in the Indyk-Motwani locality-sensitive hashing framework (STOC 1998) such as Broder's MinHash (CCS 1997) for Jaccard similarity and Andoni et al.'s cross-polytope LSH (NIPS 2015) for cosine similarity. Surprisingly, even though our solution is data-independent, for a large part of the parameter space we outperform the currently best data-dependent method by Andoni and Razenshteyn (STOC 2015).

43 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139