scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Huang et al. as discussed by the authors proposed a perceptual image hashing with locality preserving projection (LPP) to improve the robustness against content-preserving operations, where Gabor filtering is leveraged to adaptively extract the orientation and structure features, which are consistent with the response of human visual system.
Abstract: Perceptual image hashing is an effective and efficient way to identify images in large-scale databases, where two major performances are robustness and discrimination. A better tradeoff between robustness and discrimination is still a severe challenge for the current hashing research. Aiming at this issue, we design a novel perceptual image Hashing with Locality Preserving Projection (LPP) (hereafter HLPP). Specifically, to improve the robustness against content-preserving operations, Gabor filtering is leveraged to adaptively extract the orientation and structure features, which are consistent with the response of human visual system. The LPP is adopted to learn intrinsic local structure from the maximum Gabor filtering response. The use of LPP can discover meaningful low-dimensional information hidden in the maximum Gabor filtering response and thus improves discrimination of HLPP. During hash similarity calculation, the Hamming distance is selected as the metric. The tradeoff performance between robustness and discrimination is validated on benchmark databases, and the results indicate that the proposed HLPP is superior to some state-of-the-art algorithms. In addition, extensive experiments of copy detection also demonstrate that the proposed HLPP can provide higher accuracy than the compared algorithms.

4 citations

DOI
01 Jan 2016
TL;DR: A series of work to address the issues of scalability and compactness within machine learning and its applications on heuristic search and proposes HashedNets, a general approach to compressing neural network models leveraging feature hashing.
Abstract: OF THE DISSERTATION Learning with Scalability and Compactness by Wenlin Chen Doctor of Philosophy in Computer Science Washington University in St. Louis, 2016 Professor Yixin Chen, Chair Artificial Intelligence has been thriving for decades since its birth. Traditional AI features heuristic search and planning, providing good strategy for tasks that are inherently searchbased problems, such as games and GPS searching. In the meantime, machine learning, arguably the hottest subfield of AI, embraces data-driven methodology with great success in a wide range of applications such as computer vision and speech recognition. As a new trend, the applications of both learning and search have shifted toward mobile and embedded devices which entails not only scalability but also compactness of the models. Under this general paradigm, we propose a series of work to address the issues of scalability and compactness within machine learning and its applications on heuristic search. We first focus on the scalability issue of memory-based heuristic search which is recently ameliorated by Maximum Variance Unfolding (MVU), a manifold learning algorithm capable of learning state embeddings as effective heuristics to speed up A∗ search. Though achieving unprecedented online search performance with constraints on memory footprint, MVU is notoriously slow on offline training. To address this problem, we introduce Maximum Variance x Correction (MVC), which finds large-scale feasible solutions to MVU by post-processing embeddings from any manifold learning algorithm. It increases the scale of MVU embeddings by several orders of magnitude and is naturally parallel. We further propose Goal-oriented Euclidean Heuristic (GOEH), a variant to MVU embeddings, which preferably optimizes the heuristics associated with goals in the embedding while maintaining their admissibility. We demonstrate unmatched reductions in search time across several non-trivial A∗ benchmark search problems. Through these work, we bridge the gap between the manifold learning literature and heuristic search which have been regarded as fundamentally different, leading to cross-fertilization for both fields. Deep learning has made a big splash in the machine learning community with its superior accuracy performance. However, it comes at a price of huge model size that might involves billions of parameters, which poses great challenges for its use on mobile and embedded devices. To achieve the compactness, we propose HashedNets, a general approach to compressing neural network models leveraging feature hashing. At its core, HashedNets randomly group parameters using a low-cost hash function, and share parameter value within the group. According to our empirical results, a neural network could be 32x smaller with little drop in accuracy performance. We further introduce Frequency-Sensitive Hashed Nets (FreshNets) to extend this hashing technique to convolutional neural network by compressing parameters in the frequency domain. Compared with many AI applications, neural networks seem not graining as much popularity as it should be in traditional data mining tasks. For these tasks, categorical features need to be first converted to numerical representation in advance in order for neural networks to process them. We show that a naive use of the classic one-hot encoding may result in gigantic weight matrices and therefore lead to prohibitively expensive memory cost in neural

4 citations

Book ChapterDOI
16 Sep 2015
TL;DR: A new supervised hashing method to generate class-specific hash codes, which uses an inductive process based on the Inductive Manifold Hashing IMH model and leverage supervised information into hash codes generation to address these difficulties and boost the hashing quality.
Abstract: Recent years have witnessed the effectiveness and efficiency of learning-based hashing methods which generate short binary codes preserving the Euclidean similarity in the original space of high dimension. However, because of their complexities and out-of-sample problems, most of methods are not appropriate for embedding of large-scale datasets. In this paper, we have proposed a new supervised hashing method to generate class-specific hash codes, which uses an inductive process based on the Inductive Manifold Hashing IMH model and leverage supervised information into hash codes generation to address these difficulties and boost the hashing quality. It is experimentally shown that this method gets excellent performance of image classification and retrieval on large-scale multimedia dataset just with very short binary codes.

4 citations

Posted Content
TL;DR: Wang et al. as mentioned in this paper proposed a low-cost feature extraction approach and an effective deep neural network architecture for accurate and fast malware detection, which utilizes a feature hashing trick to encode the API call arguments associated with the API name.
Abstract: Dynamic malware analysis executes the program in an isolated environment and monitors its run-time behaviour (e.g. system API calls) for malware detection. This technique has been proven to be effective against various code obfuscation techniques and newly released ("zero-day") malware. However, existing works typically only consider the API name while ignoring the arguments, or require complex feature engineering operations and expert knowledge to process the arguments. In this paper, we propose a novel and low-cost feature extraction approach, and an effective deep neural network architecture for accurate and fast malware detection. Specifically, the feature representation approach utilizes a feature hashing trick to encode the API call arguments associated with the API name. The deep neural network architecture applies multiple Gated-CNNs (convolutional neural networks) to transform the extracted features of each API call. The outputs are further processed through bidirectional LSTM (long-short term memory networks) to learn the sequential correlation among API calls. Experiments show that our solution outperforms baselines significantly on a large real dataset. Valuable insights about feature engineering and architecture design are derived from the ablation study.

4 citations

Patent
21 Dec 2005
TL;DR: In this paper, a method of hash string extraction from biometric information is disclosed, which comprises the steps of providing a fingerprint in the form of a fingerprint, extracting features from the fingerprint, and encoding the features based on their location within the fingerprint; and generating a string of values based on the extracted features and their determined locations.
Abstract: A method of hash string extraction from biometric information is disclosed. The method comprises the steps of providing a biometric information sample in the form of a fingerprint for example, extracting features from the biometric information sample and encoding the features based on their location within the biometric information sample; and, generating a string of values based on the extracted features and their determined locations. The method further comprises the steps of hashing the string of symbols to produce a plurality of hash values for comparing the plurality of hash values against a stored hash value for identifying a user.

4 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838