Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Perceptual Image Hashing With Locality Preserving Projection for Copy Detection

[...]

01 Jan 2023-IEEE Transactions on Dependable and Secure Computing

TL;DR: Huang et al. as discussed by the authors proposed a perceptual image hashing with locality preserving projection (LPP) to improve the robustness against content-preserving operations, where Gabor filtering is leveraged to adaptively extract the orientation and structure features, which are consistent with the response of human visual system.

...read moreread less

Abstract: Perceptual image hashing is an effective and efficient way to identify images in large-scale databases, where two major performances are robustness and discrimination. A better tradeoff between robustness and discrimination is still a severe challenge for the current hashing research. Aiming at this issue, we design a novel perceptual image Hashing with Locality Preserving Projection (LPP) (hereafter HLPP). Specifically, to improve the robustness against content-preserving operations, Gabor filtering is leveraged to adaptively extract the orientation and structure features, which are consistent with the response of human visual system. The LPP is adopted to learn intrinsic local structure from the maximum Gabor filtering response. The use of LPP can discover meaningful low-dimensional information hidden in the maximum Gabor filtering response and thus improves discrimination of HLPP. During hash similarity calculation, the Hamming distance is selected as the metric. The tradeoff performance between robustness and discrimination is validated on benchmark databases, and the results indicate that the proposed HLPP is superior to some state-of-the-art algorithms. In addition, extensive experiments of copy detection also demonstrate that the proposed HLPP can provide higher accuracy than the compared algorithms.

...read moreread less

4 citations

DOI•

Learning with Scalability and Compactness

[...]

Wenlin Chen

01 Jan 2016

TL;DR: A series of work to address the issues of scalability and compactness within machine learning and its applications on heuristic search and proposes HashedNets, a general approach to compressing neural network models leveraging feature hashing.

...read moreread less

Abstract: OF THE DISSERTATION Learning with Scalability and Compactness by Wenlin Chen Doctor of Philosophy in Computer Science Washington University in St. Louis, 2016 Professor Yixin Chen, Chair Artificial Intelligence has been thriving for decades since its birth. Traditional AI features heuristic search and planning, providing good strategy for tasks that are inherently searchbased problems, such as games and GPS searching. In the meantime, machine learning, arguably the hottest subfield of AI, embraces data-driven methodology with great success in a wide range of applications such as computer vision and speech recognition. As a new trend, the applications of both learning and search have shifted toward mobile and embedded devices which entails not only scalability but also compactness of the models. Under this general paradigm, we propose a series of work to address the issues of scalability and compactness within machine learning and its applications on heuristic search. We first focus on the scalability issue of memory-based heuristic search which is recently ameliorated by Maximum Variance Unfolding (MVU), a manifold learning algorithm capable of learning state embeddings as effective heuristics to speed up A∗ search. Though achieving unprecedented online search performance with constraints on memory footprint, MVU is notoriously slow on offline training. To address this problem, we introduce Maximum Variance x Correction (MVC), which finds large-scale feasible solutions to MVU by post-processing embeddings from any manifold learning algorithm. It increases the scale of MVU embeddings by several orders of magnitude and is naturally parallel. We further propose Goal-oriented Euclidean Heuristic (GOEH), a variant to MVU embeddings, which preferably optimizes the heuristics associated with goals in the embedding while maintaining their admissibility. We demonstrate unmatched reductions in search time across several non-trivial A∗ benchmark search problems. Through these work, we bridge the gap between the manifold learning literature and heuristic search which have been regarded as fundamentally different, leading to cross-fertilization for both fields. Deep learning has made a big splash in the machine learning community with its superior accuracy performance. However, it comes at a price of huge model size that might involves billions of parameters, which poses great challenges for its use on mobile and embedded devices. To achieve the compactness, we propose HashedNets, a general approach to compressing neural network models leveraging feature hashing. At its core, HashedNets randomly group parameters using a low-cost hash function, and share parameter value within the group. According to our empirical results, a neural network could be 32x smaller with little drop in accuracy performance. We further introduce Frequency-Sensitive Hashed Nets (FreshNets) to extend this hashing technique to convolutional neural network by compressing parameters in the frequency domain. Compared with many AI applications, neural networks seem not graining as much popularity as it should be in traditional data mining tasks. For these tasks, categorical features need to be first converted to numerical representation in advance in order for neural networks to process them. We show that a naive use of the classic one-hot encoding may result in gigantic weight matrices and therefore lead to prohibitively expensive memory cost in neural

...read moreread less

4 citations

Book Chapter•DOI•

Hashing with Inductive Supervised Learning

[...]

Ming-Xing Zhang¹, Fumin Shen¹, Hanwang Zhang², Ning Xie³, Wankou Yang⁴ - Show less +1 more•Institutions (4)

University of Electronic Science and Technology of China¹, National University of Singapore², Tongji University³, Southeast University⁴

16 Sep 2015

TL;DR: A new supervised hashing method to generate class-specific hash codes, which uses an inductive process based on the Inductive Manifold Hashing IMH model and leverage supervised information into hash codes generation to address these difficulties and boost the hashing quality.

...read moreread less

Abstract: Recent years have witnessed the effectiveness and efficiency of learning-based hashing methods which generate short binary codes preserving the Euclidean similarity in the original space of high dimension. However, because of their complexities and out-of-sample problems, most of methods are not appropriate for embedding of large-scale datasets. In this paper, we have proposed a new supervised hashing method to generate class-specific hash codes, which uses an inductive process based on the Inductive Manifold Hashing IMH model and leverage supervised information into hash codes generation to address these difficulties and boost the hashing quality. It is experimentally shown that this method gets excellent performance of image classification and retrieval on large-scale multimedia dataset just with very short binary codes.

...read moreread less

4 citations

Posted Content•

Dynamic Malware Analysis with Feature Engineering and Feature Learning

[...]

Zhaoqi Zhang¹, Panpan Qi¹, Wei Wang¹•Institutions (1)

National University of Singapore¹

17 Jul 2019-arXiv: Cryptography and Security

TL;DR: Wang et al. as mentioned in this paper proposed a low-cost feature extraction approach and an effective deep neural network architecture for accurate and fast malware detection, which utilizes a feature hashing trick to encode the API call arguments associated with the API name.

...read moreread less

Abstract: Dynamic malware analysis executes the program in an isolated environment and monitors its run-time behaviour (e.g. system API calls) for malware detection. This technique has been proven to be effective against various code obfuscation techniques and newly released ("zero-day") malware. However, existing works typically only consider the API name while ignoring the arguments, or require complex feature engineering operations and expert knowledge to process the arguments. In this paper, we propose a novel and low-cost feature extraction approach, and an effective deep neural network architecture for accurate and fast malware detection. Specifically, the feature representation approach utilizes a feature hashing trick to encode the API call arguments associated with the API name. The deep neural network architecture applies multiple Gated-CNNs (convolutional neural networks) to transform the extracted features of each API call. The outputs are further processed through bidirectional LSTM (long-short term memory networks) to learn the sequential correlation among API calls. Experiments show that our solution outperforms baselines significantly on a large real dataset. Valuable insights about feature engineering and architecture design are derived from the ablation study.

...read moreread less

4 citations

Patent•

Method and system for hash string extraction from biometric data

[...]

Larry Hamid

21 Dec 2005

TL;DR: In this paper, a method of hash string extraction from biometric information is disclosed, which comprises the steps of providing a fingerprint in the form of a fingerprint, extracting features from the fingerprint, and encoding the features based on their location within the fingerprint; and generating a string of values based on the extracted features and their determined locations.

...read moreread less

Abstract: A method of hash string extraction from biometric information is disclosed. The method comprises the steps of providing a biometric information sample in the form of a fingerprint for example, extracting features from the biometric information sample and encoding the features based on their location within the biometric information sample; and, generating a string of values based on the extracted features and their determined locations. The method further comprises the steps of hashing the string of symbols to produce a plurality of hash values for comparing the plurality of hash values against a stored hash value for identifying a user.

...read moreread less

4 citations

Collapse

Network Information

Performance

Metrics

1,120

Papers

57,460

Citations

No. of papers in the topic in previous years
Year	Papers
2023	33
2022	89
2021	11
2020	16
2019	16
2018	38

Feature hashing

Papers published on a yearly basis

Papers

Trending Questions (2)

Network Information

Related Topics (5)

Performance

Metrics