scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes a novel ranking-based hashing framework that maps data from different modalities into a common Hamming space where the cross-modal similarity can be measured using Hamming distance and shows that theranking-based hash function has a natural probabilistic approximation which transforms the original highly discontinuous optimization problem into one that can be efficiently solved using simple gradient descent algorithms.
Abstract: Hashing has attracted a great deal of research in recent years due to its effectiveness for the retrieval and indexing of large-scale high-dimensional multimedia data. In this paper, we propose a novel ranking-based hashing framework that maps data from different modalities into a common Hamming space where the cross-modal similarity can be measured using Hamming distance. Unlike existing cross-modal hashing algorithms where the learned hash functions are binary space partitioning functions, such as the sign and threshold function, the proposed hashing scheme takes advantage of a new class of hash functions closely related to rank correlation measures which are known to be scale-invariant, numerically stable, and highly nonlinear. Specifically, we jointly learn two groups of linear subspaces, one for each modality, so that features’ ranking orders in different linear subspaces maximally preserve the cross-modal similarities. We show that the ranking-based hash function has a natural probabilistic approximation which transforms the original highly discontinuous optimization problem into one that can be efficiently solved using simple gradient descent algorithms. The proposed hashing framework is also flexible in the sense that the optimization procedures are not tied up to any specific form of loss function, which is typical for existing cross-modal hashing methods, but rather we can flexibly accommodate different loss functions with minimal changes to the learning steps. We demonstrate through extensive experiments on four widely-used real-world multimodal datasets that the proposed cross-modal hashing method can achieve competitive performance against several state-of-the-arts with only moderate training and testing time.

117 citations

Journal ArticleDOI
TL;DR: A general method of file structuring is proposed which uses a hashing function to define tree structure, and results for the probability distributions of path lengths are derived and illustrated.
Abstract: A general method of file structuring is proposed which uses a hashing function to define tree structure Two types of such trees are examined, and their relation to trees studied in the past is explained Results for the probability distributions of path lengths are derived and illustrated

116 citations

Journal ArticleDOI
TL;DR: This paper proposes an unsupervised hashing method for semi-paired cross-view retrieval, dubbed semi-PAired discrete hashing (SPDH), and explores the underlying structure of the constructed common latent subspace, where both paired and unpaired samples are well aligned.
Abstract: Due to the significant reduction in computational cost and storage, hashing techniques have gained increasing interests in facilitating large-scale cross-view retrieval tasks. Most cross-view hashing methods are developed by assuming that data from different views are well paired, e.g., text-image pairs. In real-world applications, however, this fully-paired multiview setting may not be practical. The more practical yet challenging semi-paired cross-view retrieval problem, where pairwise correspondences are only partially provided, has less been studied. In this paper, we propose an unsupervised hashing method for semi-paired cross-view retrieval, dubbed semi-paired discrete hashing (SPDH). In specific, SPDH explores the underlying structure of the constructed common latent subspace, where both paired and unpaired samples are well aligned. To effectively preserve the similarities of semi-paired data in the latent subspace, we construct the cross-view similarity graph with the help of anchor data pairs. SPDH jointly learns the latent features and hash codes with a factorization-based coding scheme. For the formulated objective function, we devise an efficient alternating optimization algorithm, where the key binary code learning problem is solved in a bit-by-bit manner with each bit generated with a closed-form solution. The proposed method is extensively evaluated on four benchmark datasets with both fully-paired and semi-paired settings and the results demonstrate the superiority of SPDH over several other state-of-the-art methods in term of both accuracy and scalability.

110 citations

Journal ArticleDOI
TL;DR: The extensive experiments show that the spherical hashing technique significantly outperforms state-of-the-art techniques based on hyperplanes across various benchmarks with sizes ranging from one to 75 million of GIST, BoW and VLAD descriptors, and is intuitive and easy to implement.
Abstract: Many binary code embedding schemes have been actively studied recently, since they can provide efficient similarity search, and compact data representations suitable for handling large scale image databases. Existing binary code embedding techniques encode high-dimensional data by using hyperplane-based hashing functions. In this paper we propose a novel hypersphere-based hashing function, spherical hashing , to map more spatially coherent data points into a binary code compared to hyperplane-based hashing functions. We also propose a new binary code distance function, spherical Hamming distance , tailored for our hypersphere-based binary coding scheme, and design an efficient iterative optimization process to achieve both balanced partitioning for each hash function and independence between hashing functions. Furthermore, we generalize spherical hashing to support various similarity measures defined by kernel functions. Our extensive experiments show that our spherical hashing technique significantly outperforms state-of-the-art techniques based on hyperplanes across various benchmarks with sizes ranging from one to 75 million of GIST, BoW and VLAD descriptors. The performance gains are consistent and large, up to 100 percent improvements over the second best method among tested methods. These results confirm the unique merits of using hyperspheres to encode proximity regions in high-dimensional spaces. Finally, our method is intuitive and easy to implement.

107 citations

Proceedings ArticleDOI
07 Apr 2008
TL;DR: A novel formulation is presented, that uses statistical observations from sample data to analyze retrieval accuracy and efficiency for the proposed indexing method, and significantly outperforms VP-trees, which are a well-known method for distance-based indexing.
Abstract: A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficient approximate nearest neighbor retrieval. Hashing methods, such as locality sensitive hashing (LSH), have been successfully applied for similarity indexing in vector spaces and string spaces under the Hamming distance. The key novelty of the hashing technique proposed here is that it can be applied to spaces with arbitrary distance measures, including non-metric distance measures. First, we describe a domain-independent method for constructing a family of binary hash functions. Then, we use these functions to construct multiple multibit hash tables. We show that the LSH formalism is not applicable for analyzing the behavior of these tables as index structures. We present a novel formulation, that uses statistical observations from sample data to analyze retrieval accuracy and efficiency for the proposed indexing method. Experiments on several real-world data sets demonstrate that our method produces good trade-offs between accuracy and efficiency, and significantly outperforms VP-trees, which are a well-known method for distance-based indexing.

105 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838