scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper proposes a supervised hashing method, i.e., the LAbel-regularized Max-margin Partition (LAMP) algorithm, which generates hash functions in weakly-supervised setting and provides a collision bound which is beyond pairwise data interaction based on Markov random fields theory.
Abstract: The explosive growth of the vision data motivates the recent studies on efficient data indexing methods such as locality-sensitive hashing (LSH). Most existing approaches perform hashing in an unsupervised way. In this paper we move one step forward and propose a supervised hashing method, i.e., the LAbel-regularized Max-margin Partition (LAMP) algorithm. The proposed method generates hash functions in weakly-supervised setting, where a small portion of sample pairs are manually labeled to be “similar” or “dissimilar”. We formulate the task as a Constrained Convex-Concave Procedure (CCCP), which can be relaxed into a series of convex sub-problems solvable with efficient Quadratic-Program (QP). The proposed hashing method possesses other characteristics including: 1) most existing LSH approaches rely on linear feature representation. Unfortunately, kernel tricks are often more natural to gauge the similarity between visual objects in vision research, which corresponds to probably infinite-dimensional Hilbert spaces. The proposed LAMP has a natural support for kernel-based feature representation. 2) traditional hashing methods assume uniform data distributions. Typically, the collision probability of two samples in hash buckets is only determined by pairwise similarity, unrelated to contextual data distribution. In contrast, we provide such a collision bound which is beyond pairwise data interaction based on Markov random fields theory. Extensive empirical evaluations are conducted on five widely-used benchmarks. It takes only several seconds to generate a new hashing function, and the adopted random supporting-vector scheme enables the LAMP algorithm scalable to large-scale problems. Experimental results well validate the superiorities of the LAMP algorithm over the state-of-the-art kernel-based hashing methods.

166 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This work focuses on the binary autoencoder model, which seeks to reconstruct an image from the binary code produced by the hash function, and shows that the optimization can be simplified with the method of auxiliary coordinates.
Abstract: An attractive approach for fast search in image databases is binary hashing, where each high-dimensional, real-valued image is mapped onto a low-dimensional, binary vector and the search is done in this binary space. Finding the optimal hash function is difficult because it involves binary constraints, and most approaches approximate the optimization by relaxing the constraints and then binarizing the result. Here, we focus on the binary autoencoder model, which seeks to reconstruct an image from the binary code produced by the hash function. We show that the optimization can be simplified with the method of auxiliary coordinates. This reformulates the optimization as alternating two easier steps: one that learns the encoder and decoder separately, and one that optimizes the code for each image. Image retrieval experiments show the resulting hash function outperforms or is competitive with state-of-the-art methods for binary hashing.

166 citations

Journal ArticleDOI
TL;DR: This paper presents a novel unsupervised multiview alignment hashing approach based on regularized kernel nonnegative matrix factorization, which can find a compact representation uncovering the hidden semantics and simultaneously respecting the joint probability distribution of data.
Abstract: Hashing is a popular and efficient method for nearest neighbor search in large-scale data spaces by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. For most hashing methods, the performance of retrieval heavily depends on the choice of the high-dimensional feature descriptor. Furthermore, a single type of feature cannot be descriptive enough for different images when it is used for hashing. Thus, how to combine multiple representations for learning effective hashing functions is an imminent task. In this paper, we present a novel unsupervised multiview alignment hashing approach based on regularized kernel nonnegative matrix factorization, which can find a compact representation uncovering the hidden semantics and simultaneously respecting the joint probability distribution of data. In particular, we aim to seek a matrix factorization to effectively fuse the multiple information sources meanwhile discarding the feature redundancy. Since the raised problem is regarded as nonconvex and discrete, our objective function is then optimized via an alternate way with relaxation and converges to a locally optimal solution. After finding the low-dimensional representation, the hashing functions are finally obtained through multivariable logistic regression. The proposed method is systematically evaluated on three data sets: 1) Caltech-256; 2) CIFAR-10; and 3) CIFAR-20, and the results show that our method significantly outperforms the state-of-the-art multiview hashing techniques.

159 citations

Proceedings ArticleDOI
28 Nov 2011
TL;DR: This work proposes a novel visual search system based on ”Bag of Hash Bits” (BoHB), in which each local feature is encoded to a very small number of hash bits, instead of quantized to visual words, and the whole image is represented as bag of hashbits.
Abstract: The advent of smart phones has provided an excellent plat- form for mobile visual search. Most of previous mobile visual search systems adopt the framework of "bag of words",in which words indicate quantized codes of visual features. In this work, we propose a novel mobile visual search system based on "bag of hash bits". Using new ideas for hash bit selection, multi-hash table generation, and hamming-distance soft scoring, we overcome the problem of bit inefficiency affecting the traditional hashing approaches, and achieve promising accuracy outperforming state of the art. The framework is also general in that general feature type can be used for generating the hash bits. Demos and experiments over a large scale product image set demonstrate the effectiveness of our approach.

153 citations

Posted Content
TL;DR: In this paper, a deep supervised discrete hashing algorithm based on the assumption that the learned binary codes should be ideal for classification is proposed. But, there are some limitations of previous deep hashing methods (e.g., the semantic information is not fully exploited).
Abstract: With the rapid growth of image and video data on the web, hashing has been extensively studied for image or video search in recent years Benefit from recent advances in deep learning, deep hashing methods have achieved promising results for image retrieval However, there are some limitations of previous deep hashing methods (eg, the semantic information is not fully exploited) In this paper, we develop a deep supervised discrete hashing algorithm based on the assumption that the learned binary codes should be ideal for classification Both the pairwise label information and the classification information are used to learn the hash codes within one stream framework We constrain the outputs of the last layer to be binary codes directly, which is rarely investigated in deep hashing algorithm Because of the discrete nature of hash codes, an alternating minimization method is used to optimize the objective function Experimental results have shown that our method outperforms current state-of-the-art methods on benchmark datasets

152 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838