scispace - formally typeset
Proceedings ArticleDOI

Malware detection via API calls, topic models and machine learning

TLDR
This work presents a model that uses text mining and topic modeling to detect malware, based on the types of API call sequences, and recommends Decision Tree as it yields `if-then' rules, which could be used as an early warning expert system.
Abstract
Dissemination of malicious code, also known as malware, poses severe challenges to cyber security Malware authors embed software in seemingly innocuous executables, unknown to a user The malware subsequently interacts with security-critical OS resources on the host system or network, in order to destroy their information or to gather sensitive information such as passwords and credit card numbers Malware authors typically use Application Programming Interface (API) calls to perpetrate these crimes We present a model that uses text mining and topic modeling to detect malware, based on the types of API call sequences We evaluated our technique on two publicly available datasets We observed that Decision Tree and Support Vector Machine yielded significant results We performed t-test with respect to sensitivity for the two models and found that statistically there is no significant difference between these models We recommend Decision Tree as it yields ‘if-then’ rules, which could be used as an early warning expert system

read more

Citations
More filters
Journal ArticleDOI

A survey of the applications of text mining in financial domain

TL;DR: A state-of-the-art survey of various applications of Text mining to finance, categorized broadly into FOREX rate prediction, stock market prediction, customer relationship management (CRM) and cyber security.
Journal ArticleDOI

Malicious sequential pattern mining for automatic malware detection

TL;DR: An effective framework using sequence mining technique and All-Nearest-Neighbor (ANN) classifier is constructed for malware detection based on the discovered patterns and promising experimental results show that the framework outperforms other alternate data mining based detection methods in identifying new malicious executables.
Posted Content

A Survey on Sensor-based Threats to Internet-of-Things (IoT) Devices and Applications

TL;DR: This survey explores various threats targeting IoT devices and discusses how their sensors can be abused for malicious purposes and presents a detailed survey about existing sensor-based threats and countermeasures that are developed specifically to secure the sensors of IoT devices.
Journal ArticleDOI

Graph embedding as a new approach for unknown malware detection

TL;DR: The main advantages of the proposed method are high detection rate despite utilizing simple classifiers like KNN, acceptable computational complexity even in large scale datasets against rival methods, and low false positive rate.
Journal ArticleDOI

A Survey on Sensor-Based Threats and Attacks to Smart Devices and Applications

TL;DR: This paper presents a detailed survey about existing sensor- based threats and attacks to smart devices and countermeasures that have been developed to secure smart devices from sensor-based threats.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

Finding scientific topics

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

Williamson, estimating the support of a high-dimensional distribution

TL;DR: The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data by carrying out sequential optimization over pairs of input patterns and providing a theoretical analysis of the statistical performance of the algorithm.
Journal ArticleDOI

Estimating the Support of a High-Dimensional Distribution

TL;DR: In this paper, the authors propose a method to estimate a function f that is positive on S and negative on the complement of S. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space.
Related Papers (5)