scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A New Malware Detection System Using Machine Learning Techniques for API Call Sequences

02 Jan 2018-Journal of Applied Security Research (Routledge)-Vol. 13, Iss: 1, pp 45-62
TL;DR: An efficient system for detecting the malwares in an Application Programmable Interfaces (APIs) and classifying its type as worms, virus, Trojans, or normal, using the Multi-Dimensional Naïve Bayes Classification (MDNBS).
Abstract: The detection and classification of malwares in windows executables is an important and demanding task in the field of data mining. The malwares can easily damage the system by creating harm in the user's system, so some of the existing techniques are developed in the traditional works for an accurate malware detection. But, it lacks some major drawbacks such as inaccurate detection, not highly efficient, requires a large amount of time to classify the malware type, and an increased computational complexity. To solve these issues, this article develops an efficient system for detecting the malwares in an Application Programmable Interfaces (APIs), and classifying its type as worms, virus, Trojans, or normal. Initially, the input dataset is preprocessed by normalizing the data, then its upper and lower boundaries are estimated during feature extraction. Furthermore, the Rete algorithm is implemented to generate the rules based on the pattern matching process. Here, the Multi-Dimensional Naive Bayes...
Citations
More filters
Journal ArticleDOI
TL;DR: A ransomware detection method that can distinguish between ransomware and benign files as well as between malware and malware is proposed and the experimental results show that the proposed method can detect ransomware among malware and benign Files.
Abstract: The number of ransomware variants has increased rapidly every year, and ransomware needs to be distinguished from the other types of malware to protect users' machines from ransomware‐based attacks. Ransomware is similar to other types of malware in some aspects, but other characteristics are clearly different. For example, ransomware generally conducts a large number of file‐related operations in a short period of time to lock or to encrypt files of a victim's machine. The signature‐based malware detection methods, which have difficulties to detect zero‐day ransomware, are not suitable to protect users' files against the attacks caused by risky unknown ransomware. Therefore, a new protection mechanism specialized for ransomware is needed, and the mechanism should focus on ransomware‐specific operations to distinguish ransomware from other types of malware as well as benign files. This paper proposes a ransomware detection method that can distinguish between ransomware and benign files as well as between ransomware and malware. The experimental results show that our proposed method can detect ransomware among malware and benign files.

49 citations

Journal ArticleDOI
TL;DR: In this article, a malware detection model using LSSVM (Least Square Support Vector Machine) learning approach connected through three distinct kernel functions i.e., linear, radial basis and polynomial.
Abstract: With the recognition of free apps, Android has become the most widely used smartphone operating system these days and it naturally invited cyber-criminals to build malware-infected apps that can steal vital information from these devices. The most critical problem is to detect malware-infected apps and keep them out of Google play store. The vulnerability lies in the underlying permission model of Android apps. Consequently, it has become the responsibility of the app developers to precisely specify the permissions which are going to be demanded by the apps during their installation and execution time. In this study, we examine the permission-induced risk which begins by giving unnecessary permissions to these Android apps. The experimental work done in this research paper includes the development of an effective malware detection system which helps to determine and investigate the detective influence of numerous well-known and broadly used set of features for malware detection. To select best features from our collected features data set we implement ten distinct feature selection approaches. Further, we developed the malware detection model by utilizing LSSVM (Least Square Support Vector Machine) learning approach connected through three distinct kernel functions i.e., linear, radial basis and polynomial. Experiments were performed by using 2,00,000 distinct Android apps. Empirical result reveals that the model build by utilizing LSSVM with RBF (i.e., radial basis kernel function) named as FSdroid is able to detect 98.8% of malware when compared to distinct anti-virus scanners and also achieved 3% higher detection rate when compared to different frameworks or approaches proposed in the literature.

33 citations

Journal ArticleDOI
TL;DR: A feature representation taxonomy is introduced in addition to the deeper taxonomy of malware analysis and detection approaches and links each approach with the most commonly used data types and introduces the feature extraction method according to the techniques used instead of the analysis approach.
Abstract: The evolution of recent malicious software with the rising use of digital services has increased the probability of corrupting data, stealing information, or other cybercrimes by malware attacks. Therefore, malicious software must be detected before it impacts a large number of computers. Recently, many malware detection solutions have been proposed by researchers. However, many challenges limit these solutions to effectively detecting several types of malware, especially zero-day attacks due to obfuscation and evasion techniques, as well as the diversity of malicious behavior caused by the rapid rate of new malware and malware variants being produced every day. Several review papers have explored the issues and challenges of malware detection from various viewpoints. However, there is a lack of a deep review article that associates each analysis and detection approach with the data type. Such an association is imperative for the research community as it helps to determine the suitable mitigation approach. In addition, the current survey articles stopped at a generic detection approach taxonomy. Moreover, some review papers presented the feature extraction methods as static, dynamic, and hybrid based on the utilized analysis approach and neglected the feature representation methods taxonomy, which is considered essential in developing the malware detection model. This survey bridges the gap by providing a comprehensive state-of-the-art review of malware detection model research. This survey introduces a feature representation taxonomy in addition to the deeper taxonomy of malware analysis and detection approaches and links each approach with the most commonly used data types. The feature extraction method is introduced according to the techniques used instead of the analysis approach. The survey ends with a discussion of the challenges and future research directions.

29 citations

Journal ArticleDOI
TL;DR: A new feature extraction method that aims at extracting features that are irrelevant to the names of system calls that is suitable for anomaly detection across platforms.
Abstract: Context In host-based anomaly detection, feature extraction on the system call traces is important to build an effective anomaly detection model. Different kinds of feature extraction methods are recently proposed and most of them aim at preserving the positional information of the system calls within a trace. These extracted features are generally named from system calls, therefore, cannot be used directly in the case of cross platform applications. In addition, some of these feature extraction methods are very costly to implement. Objective This paper presents a new feature extraction method. It aims at extracting features that are irrelevant to the names of system calls. The samples represented by the extracted features can be directly used in the case of cross platform applications. In addition, this method is lightweight in that the feature values are not expensive to compute. Method The proposed method firstly transforms the system calls in a trace into frequency sequences of n-grams and then explores a fixed number of statistical features on the frequency sequences. The extracted features are irrelevant to the names/indexes of system calls on a platform. The calculation of feature values works on the frequency sequences rather than on system call sequences. These feature vectors built on the training set with only normal data are then used to train a one class classification model for anomaly detection. Results We compared our method with four previously proposed feature extraction methods on system call traces. When used on the same platform, even though our method does not always obtain the highest AUC, overall, it performs better than all the compared methods. When testing on cross platform, it performs the best among all compared methods. Conclusion The features extracted by our method are platform-independent and are suitable for anomaly detection across platforms.

24 citations

Journal ArticleDOI
01 Jan 2023-Symmetry
TL;DR: In this paper , a high-performance malware detection system using deep learning and feature selection methodologies is introduced, where two different malware datasets are used to detect malware and differentiate it from benign activities.
Abstract: Malware is one of the most frequent cyberattacks, with its prevalence growing daily across the network. Malware traffic is always asymmetrical compared to benign traffic, which is always symmetrical. Fortunately, there are many artificial intelligence techniques that can be used to detect malware and distinguish it from normal activities. However, the problem of dealing with large and high-dimensional data has not been addressed enough. In this paper, a high-performance malware detection system using deep learning and feature selection methodologies is introduced. Two different malware datasets are used to detect malware and differentiate it from benign activities. The datasets are preprocessed, and then correlation-based feature selection is applied to produce different feature-selected datasets. The dense and LSTM-based deep learning models are then trained using these different versions of feature-selected datasets. The trained models are then evaluated using many performance metrics (accuracy, precision, recall, and F1-score). The results indicate that some feature-selected scenarios preserve almost the same original dataset performance. The different nature of the used datasets shows different levels of performance changes. For the first dataset, the feature reduction ratios range from 18.18% to 42.42%, with performance degradation of 0.07% to 5.84%, respectively. The second dataset reduction rate is between 81.77% and 93.5%, with performance degradation of 3.79% and 9.44%, respectively.

18 citations

References
More filters
Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper examines the feasibility of building a malware detector in hardware using existing performance counters and finds that data from performance counters can be used to identify malware and that the detection techniques are robust to minor variations in malware programs.
Abstract: The proliferation of computers in any domain is followed by the proliferation of malware in that domain. Systems, including the latest mobile platforms, are laden with viruses, rootkits, spyware, adware and other classes of malware. Despite the existence of anti-virus software, malware threats persist and are growing as there exist a myriad of ways to subvert anti-virus (AV) software. In fact, attackers today exploit bugs in the AV software to break into systems.In this paper, we examine the feasibility of building a malware detector in hardware using existing performance counters. We find that data from performance counters can be used to identify malware and that our detection techniques are robust to minor variations in malware programs. As a result, after examining a small set of variations within a family of malware on Android ARM and Intel Linux platforms, we can detect many variations within that family. Further, our proposed hardware modifications allow the malware detector to run securely beneath the system software, thus setting the stage for AV implementations that are simpler and less buggy than software AV. Combined, the robustness and security of hardware AV techniques have the potential to advance state-of-the-art online malware detection.

399 citations

Journal ArticleDOI
TL;DR: An online deep-learning-based Android malware detection engine (DroidDetector) that can automatically detect whether an app is a malware or not is implemented and shows that deep learning is suitable for characterizing Android malware and especially effective with the availability of more training data.

357 citations

Proceedings ArticleDOI
22 Aug 2012
TL;DR: A machine learning-based system for the detection of malware on Android devices that extracts a number of features and trains a One-Class Support Vector Machine in an offline (off-device) manner, in order to leverage the higher computing power of a server or cluster of servers.
Abstract: With the recent emergence of mobile platforms capable of executing increasingly complex software and the rising ubiquity of using mobile platforms in sensitive applications such as banking, there is a rising danger associated with malware targeted at mobile devices. The problem of detecting such malware presents unique challenges due to the limited resources avalible and limited privileges granted to the user, but also presents unique opportunity in the required metadata attached to each application. In this article, we present a machine learning-based system for the detection of malware on Android devices. Our system extracts a number of features and trains a One-Class Support Vector Machine in an offline (off-device) manner, in order to leverage the higher computing power of a server or cluster of servers.

322 citations


"A New Malware Detection System Usin..." refers methods in this paper

  • ...So, the machine learning techniques aremostly developed to protect the system by identifying themalwares (Sahs & Khan, 2012) during the process of detection....

    [...]

Proceedings ArticleDOI
04 Nov 2013
TL;DR: This paper proposes a method for malware detection based on efficient embeddings of function call graphs with an explicit feature map inspired by a linear-time graph kernel that outperforms several related approaches and detects 89% of the malware with few false alarms, while also allowing to pin-point malicious code structures within Android applications.
Abstract: The number of malicious applications targeting the Android system has literally exploded in recent years. While the security community, well aware of this fact, has proposed several methods for detection of Android malware, most of these are based on permission and API usage or the identification of expert features. Unfortunately, many of these approaches are susceptible to instruction level obfuscation techniques. Previous research on classic desktop malware has shown that some high level characteristics of the code, such as function call graphs, can be used to find similarities between samples while being more robust against certain obfuscation strategies. However, the identification of similarities in graphs is a non-trivial problem whose complexity hinders the use of these features for malware detection. In this paper, we explore how recent developments in machine learning classification of graphs can be efficiently applied to this problem. We propose a method for malware detection based on efficient embeddings of function call graphs with an explicit feature map inspired by a linear-time graph kernel. In an evaluation with 12,158 malware samples our method, purely based on structural features, outperforms several related approaches and detects 89% of the malware with few false alarms, while also allowing to pin-point malicious code structures within Android applications.

311 citations

Journal ArticleDOI
01 Jan 2016
TL;DR: An alternative solution to evaluating malware detection using the anomaly-based approach with machine learning classifiers is proposed, which revealed that the k-nearest neighbor classifier efficiently detected the latest Android malware with an 84.57 % true-positive rate higher than other classifiers.
Abstract: Mobile devices have become a significant part of people's lives, leading to an increasing number of users involved with such technology. The rising number of users invites hackers to generate malicious applications. Besides, the security of sensitive data available on mobile devices is taken lightly. Relying on currently developed approaches is not sufficient, given that intelligent malware keeps modifying rapidly and as a result becomes more difficult to detect. In this paper, we propose an alternative solution to evaluating malware detection using the anomaly-based approach with machine learning classifiers. Among the various network traffic features, the four categories selected are basic information, content based, time based and connection based. The evaluation utilizes two datasets: public (i.e. MalGenome) and private (i.e. self-collected). Based on the evaluation results, both the Bayes network and random forest classifiers produced more accurate readings, with a 99.97 % true-positive rate (TPR) as opposed to the multi-layer perceptron with only 93.03 % on the MalGenome dataset. However, this experiment revealed that the k-nearest neighbor classifier efficiently detected the latest Android malware with an 84.57 % true-positive rate higher than other classifiers.

294 citations