scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The system proposes to represent the network flows by using feature hashing, which can dramatically reduce the high-dimensional feature spaces that are general in malware analysis, and introduces the signature generation algorithm based on Bayesian method.
Abstract: The volume of malwares is growing at an exponential speed nowadays. This huge growth makes it extremely hard to analyse malware manually. Most existing signatures extracting methods are based on string signatures, and string matching is not accurate and time consuming. Therefore, this paper presents AutoMal, a system for automatically extracting signatures from large-scale malwares. Firstly, the system proposes to represent the network flows by using feature hashing, which can dramatically reduce the high-dimensional feature spaces that are general in malware analysis. Then, we design a clustering and median filtering method to classify the malware vectors into different types. Finally, it introduces the signature generation algorithm based on Bayesian method. The system can extract both the byte signature and the hash signature of malwares from its network flow with low false positive and zero false negative. Our evaluation shows that AutoMal can generate strongly noise-resisted signatures that exactly depict the characteristics of malware. Copyright © 2014 John Wiley & Sons, Ltd.

5 citations

Journal ArticleDOI
TL;DR: HW-Forest employs perceptual hashing algorithm to calculate the similarity between feature vectors in hashing screening strategy, which is used to remove the redundant feature vectors produced by multi-grained scanning and can significantly decrease the time cost and memory consumption.
Abstract: As a novel deep learning model, gcForest has been widely used in various applications. However, current multi-grained scanning of gcForest produces many redundant feature vectors, and this increases the time cost of the model. To screen out redundant feature vectors, we introduce a hashing screening mechanism for multi-grained scanning and propose a model called HW-Forest which adopts two strategies: hashing screening and window screening. HW-Forest employs perceptual hashing algorithm to calculate the similarity between feature vectors in hashing screening strategy, which is used to remove the redundant feature vectors produced by multi-grained scanning and can significantly decrease the time cost and memory consumption. Furthermore, we adopt a self-adaptive instance screening strategy called window screening to improve the performance of our approach, which can achieve higher accuracy without hyperparameter tuning on different datasets. Our experimental results show that HW-Forest has higher accuracy than other models, and the time cost is also reduced.

5 citations

Posted Content
TL;DR: In this article, a general framework for learning hash functions using affinity-based loss functions that uses auxiliary coordinates is proposed, which can be seen as a corrected, iterated version of the procedure of optimizing first over the codes and then learning the hash function.
Abstract: In supervised binary hashing, one wants to learn a function that maps a high-dimensional feature vector to a vector of binary codes, for application to fast image retrieval. This typically results in a difficult optimization problem, nonconvex and nonsmooth, because of the discrete variables involved. Much work has simply relaxed the problem during training, solving a continuous optimization, and truncating the codes a posteriori. This gives reasonable results but is quite suboptimal. Recent work has tried to optimize the objective directly over the binary codes and achieved better results, but the hash function was still learned a posteriori, which remains suboptimal. We propose a general framework for learning hash functions using affinity-based loss functions that uses auxiliary coordinates. This closes the loop and optimizes jointly over the hash functions and the binary codes so that they gradually match each other. The resulting algorithm can be seen as a corrected, iterated version of the procedure of optimizing first over the codes and then learning the hash function. Compared to this, our optimization is guaranteed to obtain better hash functions while being not much slower, as demonstrated experimentally in various supervised datasets. In addition, our framework facilitates the design of optimization algorithms for arbitrary types of loss and hash functions.

5 citations

Book ChapterDOI
01 Jan 2011
TL;DR: This work analyzes and presents preliminary empirical results on a set of learning architectures based on a feature sharding approach that present various tradeoffs between delay, degree of parallelism, representation power and empirical performance.

5 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838