scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a supervised learning framework to generate compact and bit-scalable hashing codes directly from raw images, where they pose hashing learning as a problem of regularized similarity learning.
Abstract: Extracting informative image features and learning effective approximate hashing functions are two crucial steps in image retrieval. Conventional methods often study these two steps separately, e.g., learning hash functions from a predefined hand-crafted feature space. Meanwhile, the bit lengths of output hashing codes are preset in the most previous methods, neglecting the significance level of different bits and restricting their practical flexibility. To address these issues, we propose a supervised learning framework to generate compact and bit-scalable hashing codes directly from raw images. We pose hashing learning as a problem of regularized similarity learning. In particular, we organize the training images into a batch of triplet samples, each sample containing two images with the same label and one with a different label. With these triplet samples, we maximize the margin between the matched pairs and the mismatched pairs in the Hamming space. In addition, a regularization term is introduced to enforce the adjacency consistency, i.e., images of similar appearances should have similar codes. The deep convolutional neural network is utilized to train the model in an end-to-end fashion, where discriminative image features and hash functions are simultaneously optimized. Furthermore, each bit of our hashing codes is unequally weighted, so that we can manipulate the code lengths by truncating the insignificant bits. Our framework outperforms state-of-the-arts on public benchmarks of similar image search and also achieves promising results in the application of person re-identification in surveillance. It is also shown that the generated bit-scalable hashing codes well preserve the discriminative powers with shorter code lengths.

457 citations

Proceedings Article
Jae-Pil Heo1, Youngwoon Lee1, Junfeng He2, Shih-Fu Chang2, Sung-Eui Yoon1 
16 Jun 2012
TL;DR: The extensive experiments show that the spherical hashing technique significantly outperforms six state-of-the-art hashing techniques based on hyperplanes across various image benchmarks of sizes ranging from one to 75 million of GIST descriptors, which confirms the unique merits of the proposed idea in using hyperspheres to encode proximity regions in high-dimensional spaces.
Abstract: Many binary code encoding schemes based on hashing have been actively studied recently, since they can provide efficient similarity search, especially nearest neighbor search, and compact data representations suitable for handling large scale image databases in many computer vision problems. Existing hashing techniques encode high-dimensional data points by using hyperplane-based hashing functions. In this paper we propose a novel hypersphere-based hashing function, spherical hashing, to map more spatially coherent data points into a binary code compared to hyperplane-based hashing functions. Furthermore, we propose a new binary code distance function, spherical Hamming distance, that is tailored to our hypersphere-based binary coding scheme, and design an efficient iterative optimization process to achieve balanced partitioning of data points for each hash function and independence between hashing functions. Our extensive experiments show that our spherical hashing technique significantly outperforms six state-of-the-art hashing techniques based on hyperplanes across various image benchmarks of sizes ranging from one to 75 million of GIST descriptors. The performance gains are consistent and large, up to 100% improvements. The excellent results confirm the unique merits of the proposed idea in using hyperspheres to encode proximity regions in high-dimensional spaces. Finally, our method is intuitive and easy to implement.

455 citations

01 Jan 1998
TL;DR: ANN is a library of C++ objects and procedures that supports approximate nearest neighbor searching, and is written as a testbed for a class of nearest neighbour searching algorithms, particularly those based on orthogonal decompositions of space.
Abstract: ANN is a library of C++ objects and procedures that supports approximate nearest neighbor searching. In nearest neighbor searching, we are given a set of data points S in real d-dimensional space, R d , and are to build a data structure such that, given any query point q 2 R d , the nearest data point to q can be found eeciently. In general, we are given k 1, and are asked to return the k-nearest neighbors to q in S. In approximate nearest neighbor searching, an error bound 0 is also given. The search algorithm returns k distinct points of S, such that the ratio between the distance to the ith point reported and the true ith nearest neighbor is at most 1 +. Among the features of ANN are the following. It supports k-nearest neighbor searching, by specifying k with the query. It supports both exact and approximate nearest neighbor searching, by specifying an approximation factor 0 with the query. It supports all Minkowski distance metrics, including the L 1 (Manhattan), L 2 (Eu-clidean), and L 1 (Max) metrics. There are no exponential factors in space, implying that the data structure is practical even for very large data sets in high dimensional spaces, irrespective of. ANN is written as a testbed for a class of nearest neighbor searching algorithms, particularly those based on orthogonal decompositions of space. These include k-d trees 3, 4], balanced box-decomposition trees 2] and other related spatial data structures (see Samet 5]). The library supports a number of diierent methods for building search structures. It also supports two methods for searching these structures: standard tree-ordered search 1] and priority search 2]. In priority search, the cells of the data structure are visited in increasing order of distance from the query point. In addition to the library there are two programs provided for testing and evaluating the performance of various search methods. The rst, called ann test, provides a primitive script language that allows the user to generate data sets and query sets, either by reading from a le or randomly through the use of a number of built-in point distributions. Any of a

438 citations

Proceedings ArticleDOI
Kaiming He1, Fang Wen1, Jian Sun1
23 Jun 2013
TL;DR: A novel Affinity-Preserving K-means algorithm which simultaneously performs k-mean clustering and learns the binary indices of the quantized cells and outperforms various state-of-the-art hashing encoding methods.
Abstract: In computer vision there has been increasing interest in learning hashing codes whose Hamming distance approximates the data similarity. The hashing functions play roles in both quantizing the vector space and generating similarity-preserving codes. Most existing hashing methods use hyper-planes (or kernelized hyper-planes) to quantize and encode. In this paper, we present a hashing method adopting the k-means quantization. We propose a novel Affinity-Preserving K-means algorithm which simultaneously performs k-means clustering and learns the binary indices of the quantized cells. The distance between the cells is approximated by the Hamming distance of the cell indices. We further generalize our algorithm to a product space for learning longer codes. Experiments show our method, named as K-means Hashing (KMH), outperforms various state-of-the-art hashing encoding methods.

437 citations

Proceedings ArticleDOI
23 Jun 2014
TL;DR: Experiments demonstrate that the proposed method significantly outperforms most state-of-the-art methods in retrieval precision and training time, and is orders of magnitude faster than many methods in terms of training time.
Abstract: Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the Hamming space. Non-linear hash functions have demonstrated their advantage over linear ones due to their powerful generalization capability. In the literature, kernel functions are typically used to achieve non-linearity in hashing, which achieve encouraging retrieval perfor- mance at the price of slow evaluation and training time. Here we propose to use boosted decision trees for achieving non-linearity in hashing, which are fast to train and evalu- ate, hence more suitable for hashing with high dimensional data. In our approach, we first propose sub-modular for- mulations for the hashing binary code inference problem and an efficient GraphCut based block search method for solving large-scale inference. Then we learn hash func- tions by training boosted decision trees to fit the binary codes. Experiments demonstrate that our proposed method significantly outperforms most state-of-the-art methods in retrieval precision and training time. Especially for high- dimensional data, our method is orders of magnitude faster than many methods in terms of training time.

418 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139