scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Comprehensive Review on Malware Detection Approaches

03 Jan 2020-IEEE Access (IEEE)-Vol. 8, pp 6249-6271
TL;DR: This paper presents a detailed review on malware detection approaches and recent detection methods which use these approaches, and the pros and cons of each detection approach, and methods that are used in these approaches.
Abstract: According to the recent studies, malicious software (malware) is increasing at an alarming rate, and some malware can hide in the system by using different obfuscation techniques. In order to protect computer systems and the Internet from the malware, the malware needs to be detected before it affects a large number of systems. Recently, there have been made several studies on malware detection approaches. However, the detection of malware still remains problematic. Signature-based and heuristic-based detection approaches are fast and efficient to detect known malware, but especially signature-based detection approach has failed to detect unknown malware. On the other hand, behavior-based, model checking-based, and cloud-based approaches perform well for unknown and complicated malware; and deep learning-based, mobile devices-based, and IoT-based approaches also emerge to detect some portion of known and unknown malware. However, no approach can detect all malware in the wild. This shows that to build an effective method to detect malware is a very challenging task, and there is a huge gap for new studies and methods. This paper presents a detailed review on malware detection approaches and recent detection methods which use these approaches. Paper goal is to help researchers to have a general idea of the malware detection approaches, pros and cons of each detection approach, and methods that are used in these approaches.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A feature fusion method to combine the features extracted from pre-trained AlexNet and Inception-v3 deep neural networks with features attained using segmentation-based fractal texture analysis (SFTA) of images representing the malware code to build a multimodal representation of malicious code.
Abstract: As the number of internet users increases so does the number of malicious attacks using malware. The detection of malicious code is becoming critical, and the existing approaches need to be improved. Here, we propose a feature fusion method to combine the features extracted from pre-trained AlexNet and Inception-v3 deep neural networks with features attained using segmentation-based fractal texture analysis (SFTA) of images representing the malware code. In this work, we use distinctive pre-trained models (AlexNet and Inception-V3) for feature extraction. The purpose of deep convolutional neural network (CNN) feature extraction from two models is to improve the malware classifier accuracy, because both models have characteristics and qualities to extract different features. This technique produces a fusion of features to build a multimodal representation of malicious code that can be used to classify the grayscale images, separating the malware into 25 malware classes. The features that are extracted from malware images are then classified using different variants of support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), and other classifiers. To improve the classification results, we also adopted data augmentation based on affine image transforms. The presented method is evaluated on a Malimg malware image dataset, achieving an accuracy of 99.3%, which makes it the best among the competing approaches.

79 citations

Journal ArticleDOI
TL;DR: A detailed meta-review of the existing surveys related to malware and its detection techniques, showing an arms race between these two sides of a barricade, is presented in this article.
Abstract: Cyber attacks are currently blooming, as the attackers reap significant profits from them and face a limited risk when compared to committing the “classical” crimes. One of the major components that leads to the successful compromising of the targeted system is malicious software. It allows using the victim’s machine for various nefarious purposes, e.g., making it a part of the botnet, mining cryptocurrencies, or holding hostage the data stored there. At present, the complexity, proliferation, and variety of malware pose a real challenge for the existing countermeasures and require their constant improvements. That is why, in this paper we first perform a detailed meta-review of the existing surveys related to malware and its detection techniques, showing an arms race between these two sides of a barricade. On this basis, we review the evolution of modern threats in the communication networks, with a particular focus on the techniques employing information hiding. Next, we present the bird’s eye view portraying the main development trends in detection methods with a special emphasis on the machine learning techniques. The survey is concluded with the description of potential future research directions in the field of malware detection.

63 citations

Journal ArticleDOI
TL;DR: An ensemble classification-based methodology for malware detection is proposed, with the best performance achieved by an ensemble of five dense and CNN neural networks, and the ExtraTrees classifier as a meta-learner.
Abstract: The security of information is among the greatest challenges facing organizations and institutions. Cybercrime has risen in frequency and magnitude in recent years, with new ways to steal, change and destroy information or disable information systems appearing every day. Among the types of penetration into the information systems where confidential information is processed is malware. An attacker injects malware into a computer system, after which he has full or partial access to critical information in the information system. This paper proposes an ensemble classification-based methodology for malware detection. The first-stage classification is performed by a stacked ensemble of dense (fully connected) and convolutional neural networks (CNN), while the final stage classification is performed by a meta-learner. For a meta-learner, we explore and compare 14 classifiers. For a baseline comparison, 13 machine learning methods are used: K-Nearest Neighbors, Linear Support Vector Machine (SVM), Radial basis function (RBF) SVM, Random Forest, AdaBoost, Decision Tree, ExtraTrees, Linear Discriminant Analysis, Logistic, Neural Net, Passive Classifier, Ridge Classifier and Stochastic Gradient Descent classifier. We present the results of experiments performed on the Classification of Malware with PE headers (ClaMP) dataset. The best performance is achieved by an ensemble of five dense and CNN neural networks, and the ExtraTrees classifier as a meta-learner.

47 citations

Journal ArticleDOI
TL;DR: This work aims at systematically reviewing and analyzing the research landscape about DL approaches applied to different IoT security scenarios and characterized these studies according to three main research questions, namely, the involved security aspects, the used DL network architectures, and the engaged datasets.

44 citations

Journal ArticleDOI
TL;DR: In this paper, a novel deep learning-based architecture is proposed which can classify malware variants based on a hybrid model, which integrates two wide-ranging pre-trained network models in an optimized manner.
Abstract: Recent technological developments in computer systems transfer human life from real to virtual environments. Covid-19 disease has accelerated this process. Cyber criminals’ interest has shifted in a real to virtual life as well. This is because it is easier to commit a crime in cyberspace rather than regular life. Malicious software (malware) is unwanted software which is frequently used by cyber criminals to launch cyber-attacks. Malware variants are continuing to evolve by using advanced obfuscation and packing techniques. These concealing techniques make malware detection and classification significantly challenging. Novel methods which are quite different from traditional methods must be used to effectively combat with new malware variants. Traditional artificial intelligence (AI) specifically machine learning (ML) algorithms are no longer effective in detecting all new and complex malware variants. Deep learning (DL) approach which is quite different from traditional ML algorithms can be a promising solution to the problem of detecting all variants of malware. In this study, a novel deep-learning-based architecture is proposed which can classify malware variants based on a hybrid model. The main contribution of the study is to propose a new hybrid architecture which integrates two wide-ranging pre-trained network models in an optimized manner. This architecture consists of four main stages, namely: data acquisition, the design of deep neural network architecture, training of the proposed deep neural network architecture, and evaluation of the trained deep neural network. The proposed method tested on Malimg, Microsoft BIG 2015, and Malevis datasets. The experimental results show that the suggested method can effectively classify malware with high accuracy which outperforms the state of the art methods in the literature. When proposed method tested on Malimg dataset, 97.78% accuracy is obtained which is outperformed most of the ML-based malware detection method.

44 citations

References
More filters
Book
01 Jan 2009
TL;DR: The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.
Abstract: Can machine learning deliver AI? Theoretical results, inspiration from the brain and cognition, as well as machine learning experiments suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one would need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers, graphical models with many levels of latent variables, or in complicated propositional formulae re-using many sub-formulae. Each level of the architecture represents features at a different level of abstraction, defined as a composition of lower-level features. Searching the parameter space of deep architectures is a difficult task, but new algorithms have been discovered and a new sub-area has emerged in the machine learning community since 2006, following these discoveries. Learning algorithms such as those for Deep Belief Networks and other related unsupervised learning algorithms have recently been proposed to train deep architectures, yielding exciting results and beating the state-of-the-art in certain areas. Learning Deep Architectures for AI discusses the motivations for and principles of learning algorithms for deep architectures. By analyzing and comparing recent results with different learning algorithms for deep architectures, explanations for their success are proposed and discussed, highlighting challenges and suggesting avenues for future explorations in this area.

7,767 citations


"A Comprehensive Review on Malware D..." refers background in this paper

  • ...the deep belief network (DBN) [94] that utilizes the built restricted Boltzmannmachines (RBM)which is beneficial for better characterizing Android apps....

    [...]

  • ...In the pre-training phase, they adopted VOLUME 8, 2020 6261 the deep belief network (DBN) [94] that utilizes the built restricted Boltzmannmachines (RBM)which is beneficial for better characterizing Android apps....

    [...]

Proceedings ArticleDOI
08 Jul 2009
TL;DR: A new data set is proposed, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.
Abstract: During the last decade, anomaly detection has attracted the attention of many researchers to overcome the weakness of signature-based IDSs in detecting novel attacks, and KDDCUP'99 is the mostly widely used data set for the evaluation of these systems. Having conducted a statistical analysis on this data set, we found two important issues which highly affects the performance of evaluated systems, and results in a very poor evaluation of anomaly detection approaches. To solve these issues, we have proposed a new data set, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.

3,300 citations


"A Comprehensive Review on Malware D..." refers methods in this paper

  • ...• NSL-KDD dataset (2009): It is an updated version of the KDD’99 dataset which consists of approximately 125,000 records and 41 features [22]....

    [...]

Proceedings ArticleDOI
01 Jan 2014
TL;DR: DREBIN is proposed, a lightweight method for detection of Android malware that enables identifying malicious applications directly on the smartphone and outperforms several related approaches and detects 94% of the malware with few false alarms.
Abstract: Malicious applications pose a threat to the security of the Android platform. The growing amount and diversity of these applications render conventional defenses largely ineffective and thus Android smartphones often remain unprotected from novel malware. In this paper, we propose DREBIN, a lightweight method for detection of Android malware that enables identifying malicious applications directly on the smartphone. As the limited resources impede monitoring applications at run-time, DREBIN performs a broad static analysis, gathering as many features of an application as possible. These features are embedded in a joint vector space, such that typical patterns indicative for malware can be automatically identified and used for explaining the decisions of our method. In an evaluation with 123,453 applications and 5,560 malware samples DREBIN outperforms several related approaches and detects 94% of the malware with few false alarms, where the explanations provided for each detection reveal relevant properties of the detected malware. On five popular smartphones, the method requires 10 seconds for an analysis on average, rendering it suitable for checking downloaded applications directly on the device.

1,905 citations


"A Comprehensive Review on Malware D..." refers background in this paper

  • ...• Drebin dataset (2014): This dataset is created for smart phones to examine the effectiveness of the existing antivirus software [23]....

    [...]

Proceedings ArticleDOI
14 May 2001
TL;DR: This work presents a data mining framework that detects new, previously unseen malicious executables accurately and automatically and more than doubles the current detection rates for new malicious executable.
Abstract: A serious security threat today is malicious executables, especially new, unseen malicious executables often arriving as email attachments. These new malicious executables are created at the rate of thousands every year and pose a serious security threat. Current anti-virus systems attempt to detect these new malicious programs with heuristics generated by hand. This approach is costly and oftentimes ineffective. We present a data mining framework that detects new, previously unseen malicious executables accurately and automatically. The data mining framework automatically found patterns in our data set and used these patterns to detect a set of new malicious binaries. Comparing our detection methods with a traditional signature-based method, our method more than doubles the current detection rates for new malicious executables.

1,106 citations

Proceedings ArticleDOI
17 Oct 2011
TL;DR: The method is shown to be an effective means of isolating the malware and alerting the users of a downloaded malware, showing the potential for avoiding the spreading of a detected malware to a larger community.
Abstract: The sharp increase in the number of smartphones on the market, with the Android platform posed to becoming a market leader makes the need for malware analysis on this platform an urgent issue.In this paper we capitalize on earlier approaches for dynamic analysis of application behavior as a means for detecting malware in the Android platform. The detector is embedded in a overall framework for collection of traces from an unlimited number of real users based on crowdsourcing. Our framework has been demonstrated by analyzing the data collected in the central server using two types of data sets: those from artificial malware created for test purposes, and those from real malware found in the wild. The method is shown to be an effective means of isolating the malware and alerting the users of a downloaded malware. This shows the potential for avoiding the spreading of a detected malware to a larger community.

1,035 citations

Trending Questions (1)
How do I scan a pixel for malware?

This shows that to build an effective method to detect malware is a very challenging task, and there is a huge gap for new studies and methods.