scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Proceedings ArticleDOI
01 Dec 2013
TL;DR: This framework decomposes the hashing learning problem into two steps: hash bit learning and hash function learning based on the learned bits, and can typically be formulated as binary quadratic problems, and the second step can be accomplished by training standard binary classifiers.
Abstract: Most existing approaches to hashing apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of the method to respond to the data, and can result in complex optimization problems that are difficult to solve. Here we propose a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. This framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problem-specific hashing methods. Our framework decomposes the hashing learning problem into two steps: hash bit learning and hash function learning based on the learned bits. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training standard binary classifiers. Both problems have been extensively studied in the literature. Our extensive experiments demonstrate that the proposed framework is effective, flexible and outperforms the state-of-the-art.

200 citations

Journal ArticleDOI
TL;DR: A robust hashing method is developed for detecting image forgery including removal, insertion, and replacement of objects, and abnormal color modification, and for locating the forged area.
Abstract: A robust hashing method is developed for detecting image forgery including removal, insertion, and replacement of objects, and abnormal color modification, and for locating the forged area. Both global and local features are used in forming the hash sequence. The global features are based on Zernike moments representing luminance and chrominance characteristics of the image as a whole. The local features include position and texture information of salient regions in the image. Secret keys are introduced in feature extraction and hash construction. While being robust against content-preserving image processing, the hash is sensitive to malicious tampering and, therefore, applicable to image authentication. The hash of a test image is compared with that of a reference image. When the hash distance is greater than a threshold τ1 and less than τ2, the received image is judged as a fake. By decomposing the hashes, the type of image forgery and location of forged areas can be determined. Probability of collision between hashes of different images approaches zero. Experimental results are presented to show effectiveness of the method.

200 citations

Journal ArticleDOI
01 Dec 2009
TL;DR: An efficient data-parallel algorithm for building large hash tables of millions of elements in real-time, which considers a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations.
Abstract: We demonstrate an efficient data-parallel algorithm for building large hash tables of millions of elements in real-time. We consider two parallel algorithms for the construction: a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations. Our construction is a hybrid approach that uses both algorithms. We measure the construction time, access time, and memory usage of our implementations and demonstrate real-time performance on large datasets: for 5 million key-value pairs, we construct a hash table in 35.7 ms using 1.42 times as much memory as the input data itself, and we can access all the elements in that hash table in 15.3 ms. For comparison, sorting the same data requires 36.6 ms, but accessing all the elements via binary search requires 79.5 ms. Furthermore, we show how our hashing methods can be applied to two graphics applications: 3D surface intersection for moving data and geometric hashing for image matching.

194 citations

Journal ArticleDOI
TL;DR: This work incorporates ring partition and invariant vector distance to image hashing algorithm for enhancing rotation robustness and discriminative capability, and demonstrates that the proposed hashing algorithm is robust at commonly used digital operations to images.
Abstract: Robustness and discrimination are two of the most important objectives in image hashing. We incorporate ring partition and invariant vector distance to image hashing algorithm for enhancing rotation robustness and discriminative capability. As ring partition is unrelated to image rotation, the statistical features that are extracted from image rings in perceptually uniform color space, i.e., CIE L*a*b* color space, are rotation invariant and stable. In particular, the Euclidean distance between vectors of these perceptual features is invariant to commonly used digital operations to images (e.g., JPEG compression, gamma correction, and brightness/contrast adjustment), which helps in making image hash compact and discriminative. We conduct experiments to evaluate the efficiency with 250 color images, and demonstrate that the proposed hashing algorithm is robust at commonly used digital operations to images. In addition, with the receiver operating characteristics curve, we illustrate that our hashing is much better than the existing popular hashing algorithms at robustness and discrimination.

192 citations

Proceedings ArticleDOI
13 Apr 2008
TL;DR: A novel hash table construction called Peacock hash is proposed, which reduces the on-chip memory by more than 10-folds while keeping a high degree of determinism in performance.
Abstract: Hash tables are extensively used in networking to implement data-structures that associate a set of keys to a set of values, as they provide O(1), query, insert and delete operations. However, at moderate or high loads collisions are quite frequent which not only increases the access time, but also induces non- determinism in the performance. Due to this non-determinism, the performance of these hash tables degrades sharply in the multi-threaded network processor based environments, where a collection of threads perform the hashing operations in a loosely synchronized manner. In such systems, it is critical to keep the hash operations more deterministic. A recent series of papers have been proposed, which employs a compact on-chip memory to enable deterministic and fast hash queries. While effective, these schemes require substantial on- chip memory, roughly 10-bits for every entry in the hash table. This limits their general usability; specifically in the network processor context, where on-chip resources are scarce. In this paper, we propose a novel hash table construction called Peacock hash, which reduces the on-chip memory by more than 10-folds while keeping a high degree of determinism in performance. This significantly reduced on-chip memory not only makes Peacock hashing much more appealing for the general use but also makes it an attractive choice for the implementation of a hash hardware accelerator on a network processor.

187 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838