Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

AutoMal: automatic clustering and signature generation for malwares based on the network flow

[...]

Sun Hao¹, Wen Wang¹, Huabiao Lu¹, Peige Ren¹•Institutions (1)

National University of Defense Technology¹

10 Jul 2015-Security and Communication Networks

TL;DR: The system proposes to represent the network flows by using feature hashing, which can dramatically reduce the high-dimensional feature spaces that are general in malware analysis, and introduces the signature generation algorithm based on Bayesian method.

...read moreread less

Abstract: The volume of malwares is growing at an exponential speed nowadays. This huge growth makes it extremely hard to analyse malware manually. Most existing signatures extracting methods are based on string signatures, and string matching is not accurate and time consuming. Therefore, this paper presents AutoMal, a system for automatically extracting signatures from large-scale malwares. Firstly, the system proposes to represent the network flows by using feature hashing, which can dramatically reduce the high-dimensional feature spaces that are general in malware analysis. Then, we design a clustering and median filtering method to classify the malware vectors into different types. Finally, it introduces the signature generation algorithm based on Bayesian method. The system can extract both the byte signature and the hash signature of malwares from its network flow with low false positive and zero false negative. Our evaluation shows that AutoMal can generate strongly noise-resisted signatures that exactly depict the characteristics of malware. Copyright © 2014 John Wiley & Sons, Ltd.

...read moreread less

5 citations

Journal Article•DOI•

A Novel Simple Visual Tracking Algorithm Based on Hashing and Deep Learning

[...]

Zhu Suguo, Junping Du, Ren Nan

01 Sep 2017-Chinese Journal of Electronics

5 citations

Journal Article•DOI•

HW-Forest: Deep Forest with Hashing Screening and Window Screening

[...]

Pengfei Ma, Youxi Wu, Yang Li, Lei Guo, He Jiang, Xingquan Zhu, Xin Wu - Show less +3 more

04 May 2022-ACM Transactions on Knowledge Discovery From Data

TL;DR: HW-Forest employs perceptual hashing algorithm to calculate the similarity between feature vectors in hashing screening strategy, which is used to remove the redundant feature vectors produced by multi-grained scanning and can significantly decrease the time cost and memory consumption.

...read moreread less

Abstract: As a novel deep learning model, gcForest has been widely used in various applications. However, current multi-grained scanning of gcForest produces many redundant feature vectors, and this increases the time cost of the model. To screen out redundant feature vectors, we introduce a hashing screening mechanism for multi-grained scanning and propose a model called HW-Forest which adopts two strategies: hashing screening and window screening. HW-Forest employs perceptual hashing algorithm to calculate the similarity between feature vectors in hashing screening strategy, which is used to remove the redundant feature vectors produced by multi-grained scanning and can significantly decrease the time cost and memory consumption. Furthermore, we adopt a self-adaptive instance screening strategy called window screening to improve the performance of our approach, which can achieve higher accuracy without hyperparameter tuning on different datasets. Our experimental results show that HW-Forest has higher accuracy than other models, and the time cost is also reduced.

...read moreread less

5 citations

Posted Content•

Optimizing affinity-based binary hashing using auxiliary coordinates

[...]

Ramin Raziperchikolaei¹, Miguel Á. Carreira-Perpiñán¹•Institutions (1)

University of California, Merced¹

21 Jan 2015-arXiv: Learning

TL;DR: In this article, a general framework for learning hash functions using affinity-based loss functions that uses auxiliary coordinates is proposed, which can be seen as a corrected, iterated version of the procedure of optimizing first over the codes and then learning the hash function.

...read moreread less

Abstract: In supervised binary hashing, one wants to learn a function that maps a high-dimensional feature vector to a vector of binary codes, for application to fast image retrieval. This typically results in a difficult optimization problem, nonconvex and nonsmooth, because of the discrete variables involved. Much work has simply relaxed the problem during training, solving a continuous optimization, and truncating the codes a posteriori. This gives reasonable results but is quite suboptimal. Recent work has tried to optimize the objective directly over the binary codes and achieved better results, but the hash function was still learned a posteriori, which remains suboptimal. We propose a general framework for learning hash functions using affinity-based loss functions that uses auxiliary coordinates. This closes the loop and optimizes jointly over the hash functions and the binary codes so that they gradually match each other. The resulting algorithm can be seen as a corrected, iterated version of the procedure of optimizing first over the codes and then learning the hash function. Compared to this, our optimization is guaranteed to obtain better hash functions while being not much slower, as demonstrated experimentally in various supervised datasets. In addition, our framework facilitates the design of optimization algorithms for arbitrary types of loss and hash functions.

...read moreread less

5 citations

Book Chapter•DOI•

Scaling Up Machine Learning: Parallel Online Learning

[...]

Daniel Hsu, Nikos Karampatziakis, John Langford, Alexander J. Smola

01 Jan 2011

TL;DR: This work analyzes and presents preliminary empirical results on a set of learning architectures based on a feature sharding approach that present various tradeoffs between delay, degree of parallelism, representation power and empirical performance.

...read moreread less

5 citations

Collapse

Network Information

Performance

Metrics

1,120

Papers

57,460

Citations

No. of papers in the topic in previous years
Year	Papers
2023	33
2022	89
2021	11
2020	16
2019	16
2018	38

Feature hashing

Papers published on a yearly basis

Papers

Trending Questions (2)

Network Information

Related Topics (5)

Performance

Metrics