scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The experimental results demonstrate that CMFH can significantly outperform several state-of-the-art cross-modality Hashing methods, which validates the effectiveness of the proposed CMFH.
Abstract: By transforming data into binary representation, i.e., Hashing, we can perform high-speed search with low storage cost, and thus, Hashing has collected increasing research interest in the recent years. Recently, how to generate Hashcode for multimodal data (e.g., images with textual tags, documents with photos, and so on) for large-scale cross-modality search (e.g., searching semantically related images in database for a document query) is an important research issue because of the fast growth of multimodal data in the Web. To address this issue, a novel framework for multimodal Hashing is proposed, termed as Collective Matrix Factorization Hashing (CMFH). The key idea of CMFH is to learn unified Hashcodes for different modalities of one multimodal instance in the shared latent semantic space in which different modalities can be effectively connected. Therefore, accurate cross-modality search is supported. Based on the general framework, we extend it in the unsupervised scenario where it tries to preserve the Euclidean structure, and in the supervised scenario where it fully exploits the label information of data. The corresponding theoretical analysis and the optimization algorithms are given. We conducted comprehensive experiments on three benchmark data sets for cross-modality search. The experimental results demonstrate that CMFH can significantly outperform several state-of-the-art cross-modality Hashing methods, which validates the effectiveness of the proposed CMFH.

181 citations

Journal ArticleDOI
TL;DR: Quantization-based Hashing (QBH) is a generic framework which incorporates the advantages of quantization error reduction methods into conventional property preserving hashing methods and can be applied to both unsupervised and supervised hashing methods.

179 citations

Journal ArticleDOI
TL;DR: The proposed RASH feature vector is more robust and provides much stronger discrimination than a conventional histogram-based feature vector, and appears to be a good candidate to build indexing algorithms, copy-detection systems, or content-based authentication mechanisms.
Abstract: Robust signal hashing defines a feature vector that characterizes the signal, independently of "nonsignificant" distortions of its content. When dealing with images, the considered distortions are typically due to compression or small geometrical manipulations. In other words, robustness means that images that are visually indistinguishable should produce equal or similar hash values. To discriminate image contents, a hash function should produce distinct outputs for different images. Our paper first proposes a robust hashing algorithm for still images. It is based on radial projection of the image pixels and is denoted the Radial hASHing (RASH) algorithm. Experimental results provided on the USC-SIPI dataset reveal that the proposed RASH feature vector is more robust and provides much stronger discrimination than a conventional histogram-based feature vector. The RASH vector appears to be a good candidate to build indexing algorithms, copy-detection systems, or content-based authentication mechanisms. To take benefit from the RASH vector capabilities, video content is summarized into key frames, each of them characterizing a video shot and described by its RASH vector. The resulting video hashing system works in real time and supports most distortions due to common spatial and temporal video distortions.

175 citations

Proceedings ArticleDOI
20 Jun 2011
TL;DR: Experiments show that the new Random Maximum Margin Hashing scheme (RMMH) outperforms four state-of-the-art hashing methods, notably in kernel spaces.
Abstract: Following the success of hashing methods for multidimensional indexing, more and more works are interested in embedding visual feature space in compact hash codes. Such approaches are not an alternative to using index structures but a complementary way to reduce both the memory usage and the distance computation cost. Several data dependent hash functions have notably been proposed to closely fit data distribution and provide better selectivity than usual random projections such as LSH. However, improvements occur only for relatively small hash code sizes up to 64 or 128 bits. As discussed in the paper, this is mainly due to the lack of independence between the produced hash functions. We introduce a new hash function family that attempts to solve this issue in any kernel space. Rather than boosting the collision probability of close points, our method focus on data scattering. By training purely random splits of the data, regardless the closeness of the training samples, it is indeed possible to generate consistently more independent hash functions. On the other side, the use of large margin classifiers allows to maintain good generalization performances. Experiments show that our new Random Maximum Margin Hashing scheme (RMMH) outperforms four state-of-the-art hashing methods, notably in kernel spaces.

175 citations

Proceedings ArticleDOI
01 Dec 2013
TL;DR: The extensive results corroborate that the learned hash codes via list wise supervision can provide superior search accuracy without incurring heavy computational overhead.
Abstract: Hashing techniques have been intensively investigated in the design of highly efficient search engines for large-scale computer vision applications. Compared with prior approximate nearest neighbor search approaches like tree-based indexing, hashing-based search schemes have prominent advantages in terms of both storage and computational efficiencies. Moreover, the procedure of devising hash functions can be easily incorporated into sophisticated machine learning tools, leading to data-dependent and task-specific compact hash codes. Therefore, a number of learning paradigms, ranging from unsupervised to supervised, have been applied to compose appropriate hash functions. However, most of the existing hash function learning methods either treat hash function design as a classification problem or generate binary codes to satisfy pair wise supervision, and have not yet directly optimized the search accuracy. In this paper, we propose to leverage list wise supervision into a principled hash function learning framework. In particular, the ranking information is represented by a set of rank triplets that can be used to assess the quality of ranking. Simple linear projection-based hash functions are solved efficiently through maximizing the ranking quality over the training data. We carry out experiments on large image datasets with size up to one million and compare with the state-of-the-art hashing techniques. The extensive results corroborate that our learned hash codes via list wise supervision can provide superior search accuracy without incurring heavy computational overhead.

172 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838