Malware detection via API calls, topic models and machine learning

doi:10.1109/COASE.2015.7294263

Proceedings ArticleDOI

Malware detection via API calls, topic models and machine learning

- pp 1212-1217

TLDR

This work presents a model that uses text mining and topic modeling to detect malware, based on the types of API call sequences, and recommends Decision Tree as it yields `if-then' rules, which could be used as an early warning expert system.

Abstract:

Dissemination of malicious code, also known as malware, poses severe challenges to cyber security Malware authors embed software in seemingly innocuous executables, unknown to a user The malware subsequently interacts with security-critical OS resources on the host system or network, in order to destroy their information or to gather sensitive information such as passwords and credit card numbers Malware authors typically use Application Programming Interface (API) calls to perpetrate these crimes We present a model that uses text mining and topic modeling to detect malware, based on the types of API call sequences We evaluated our technique on two publicly available datasets We observed that Decision Tree and Support Vector Machine yielded significant results We performed t-test with respect to sensitivity for the two models and found that statistically there is no significant difference between these models We recommend Decision Tree as it yields ‘if-then’ rules, which could be used as an early warning expert system

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A survey of the applications of text mining in financial domain

B. Shravan Kumar, +1 more

- 15 Dec 2016 -

Knowledge Based Systems

TL;DR: A state-of-the-art survey of various applications of Text mining to finance, categorized broadly into FOREX rate prediction, stock market prediction, customer relationship management (CRM) and cyber security.

...read moreread less

Journal ArticleDOI

Malicious sequential pattern mining for automatic malware detection

Yujie Fan, +2 more

- 15 Jun 2016 -

Expert Systems With Applications

TL;DR: An effective framework using sequence mining technique and All-Nearest-Neighbor (ANN) classifier is constructed for malware detection based on the discovered patterns and promising experimental results show that the framework outperforms other alternate data mining based detection methods in identifying new malicious executables.

...read moreread less

Posted Content

A Survey on Sensor-based Threats to Internet-of-Things (IoT) Devices and Applications

Amit Kumar Sikder, +4 more

- 06 Feb 2018 -

arXiv: Cryptography and Security

TL;DR: This survey explores various threats targeting IoT devices and discusses how their sensors can be abused for malicious purposes and presents a detailed survey about existing sensor-based threats and countermeasures that are developed specifically to secure the sensors of IoT devices.

...read moreread less

Journal ArticleDOI

Graph embedding as a new approach for unknown malware detection

Hashem Hashemi, +3 more

- 01 Aug 2017 -

Journal of Computer Virology and Hacking...

TL;DR: The main advantages of the proposed method are high detection rate despite utilizing simple classifiers like KNN, acceptable computational complexity even in large scale datasets against rival methods, and low false positive rate.

...read moreread less

Journal ArticleDOI

A Survey on Sensor-Based Threats and Attacks to Smart Devices and Applications

Amit Kumar Sikder, +4 more

- 08 Mar 2021 -

IEEE Communications Surveys and Tutorial...

TL;DR: This paper presents a detailed survey about existing sensor- based threats and attacks to smart devices and countermeasures that have been developed to secure smart devices from sensor-based threats.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Journal ArticleDOI

Finding scientific topics

Thomas L. Griffiths, +1 more

- 06 Apr 2004 -

Proceedings of the National Academy of S...

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

...read moreread less

Williamson, estimating the support of a high-dimensional distribution

Bernhard Schölkopf, +2 more

TL;DR: The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data by carrying out sequential optimization over pairs of input patterns and providing a theoretical analysis of the statistical performance of the algorithm.

...read moreread less

Journal ArticleDOI

Estimating the Support of a High-Dimensional Distribution

Bernhard Schölkopf, +4 more

- 01 Jul 2001 -

Neural Computation

TL;DR: In this paper, the authors propose a method to estimate a function f that is positive on S and negative on the complement of S. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space.

...read moreread less