scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Proceedings ArticleDOI
27 May 2018
TL;DR: The Weight-Median Sketch as mentioned in this paper adopts the core data structure used in the Count-Sketch, but instead of sketching counts, it captures sketched gradient updates to the model parameters.
Abstract: We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. Unlike related sketches that capture the most frequently-occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median Sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis that establishes recovery guarantees for batch and online learning, and demonstrate empirical improvements in memory-accuracy trade-offs over alternative memory-budgeted methods, including count-based sketches and feature hashing.

32 citations

Journal ArticleDOI
TL;DR: The definition of OP is given and it is demonstrated that it meets all the requirements of the hash function, and thus, would be appropriate for hashing based fast palmprint identification.
Abstract: In this paper, we present two hashing based techniques for fast palmprint identification. We first propose three properties required by a hash function and then introduce our first fast identification method based on orientation pattern (OP) hashing. We give the definition of OP and demonstrate that it meets all the requirements of the hash function, and thus, would be appropriate for hashing based fast palmprint identification. We then introduce the second fast identification method based on principal orientation pattern (POP) hashing. Because the POPs are constructed using more stable orientation features, POP hashing can find the target template more quickly thus causing earlier termination of the identification process. We evaluate our methods on the Hong Kong PolyU large-scale database (9667 palms) and the CASIA palmprint database (600 palms) plus a synthetic database (100 000 palms). Experimental results show that, on the Hong Kong PolyU large-scale database, the speedups of OP hashing and POP hashing over brute-force search are 16.93 and 19.91, respectively, and the identification accuracy is slightly higher. While on the CASIA database plus the synthetic database, the speedups of OP hashing and POP hashing are 8.03 and 15.67, respectively, and the identification accuracy almost remains the same. Results also show that, in terms of accuracy, our methods are comparable to several state-of-the-art palmprint identification approaches, while in terms of speed, our methods are much faster.

32 citations

Posted Content
TL;DR: SSDH performs joint learning of image representations, hash codes, and classification in a pointwised manner and thus is naturally scalable to large-scale datasets and outperforms other unsupervised and supervised hashing approaches on several benchmarks and one large dataset comprising more than 1 million images.
Abstract: This paper presents a supervised deep hashing approach that constructs binary hash codes from labeled data for large-scale image search. We assume that semantic labels are governed by a set of latent attributes in which each attribute can be on or off, and classification relies on these attributes. Based on this assumption, our approach, dubbed supervised semantics-preserving deep hashing (SSDH), constructs hash functions as a latent layer in a deep network in which binary codes are learned by the optimization of an objective function defined over classification error and other desirable properties of hash codes. With this design, SSDH has a nice property that classification and retrieval are unified in a single learning model, and the learned binary codes not only preserve the semantic similarity between images but also are efficient for image search. Moreover, SSDH performs joint learning of image representations, hash codes, and classification in a pointwised manner and thus is naturally scalable to large-scale datasets. SSDH is simple and can be easily realized by a slight modification of an existing deep architecture for classification; yet it is effective and outperforms other unsupervised and supervised hashing approaches on several benchmarks and one large dataset comprising more than 1 million images.

32 citations

Book ChapterDOI
21 Aug 2009
TL;DR: This paper significantly advance the state of the art by proving a polylogarithmic bound on the more efficient random-walk method, where items repeatedly kick out random blocking items until a free location for an item is found.
Abstract: In this paper, we provide a polylogarithmic bound that holds with high probability on the insertion time for cuckoo hashing under the random-walk insertion method. Cuckoo hashing provides a useful methodology for building practical, high-performance hash tables. The essential idea of cuckoo hashing is to combine the power of schemes that allow multiple hash locations for an item with the power to dynamically change the location of an item among its possible locations. Previous work on the case where the number of choices is larger than two has required a breadth-first search analysis, which is both inefficient in practice and currently has only a polynomial high probability upper bound on the insertion time. Here we significantly advance the state of the art by proving a polylogarithmic bound on the more efficient random-walk method, where items repeatedly kick out random blocking items until a free location for an item is found.

32 citations

Book ChapterDOI
11 Dec 2007
TL;DR: Results show that the proposed method can resist perceptually insignificant modifications such as compression, filtering, scaling and rotation and is also able to successfully detect content changing attacks such as insertion of foreign objects.
Abstract: Image hash function based on the image content has applications in watermarking, authentication and image retrieval. This paper presents an algorithm for generating an image hash that is robust against content-preserving modifications and at the same time, is capable of detecting malicious tampering. Robust features are first extracted from the discrete wavelet transform followed by the Radon transform. Probabilistic quantization is then used to map the feature values to a binary sequence. Results show that the proposed method can resist perceptually insignificant modifications such as compression, filtering, scaling and rotation. It is also able to successfully detect content changing attacks such as insertion of foreign objects.

32 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838