scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A survey of malware behavior description and analysis

Bo Yu1, Fang Ying1, Qiang Yang1, Yong Tang1, Liu Liu1 
20 Jul 2018-Journal of Zhejiang University Science C (Zhejiang University Press)-Vol. 19, Iss: 5, pp 583-603
TL;DR: This paper conducts a survey on malware behavior description and analysis considering three aspects: malware behavior described, behavior analysis methods, and visualization techniques.
Abstract: Behavior-based malware analysis is an important technique for automatically analyzing and detecting malware, and it has received considerable attention from both academic and industrial communities. By considering how malware behaves, we can tackle the malware obfuscation problem, which cannot be processed by traditional static analysis approaches, and we can also derive the as-built behavior specifications and cover the entire behavior space of the malware samples. Although there have been several works focusing on malware behavior analysis, such research is far from mature, and no overviews have been put forward to date to investigate current developments and challenges. In this paper, we conduct a survey on malware behavior description and analysis considering three aspects: malware behavior description, behavior analysis methods, and visualization techniques. First, existing behavior data types and emerging techniques for malware behavior description are explored, especially the goals, principles, characteristics, and classifications of behavior analysis techniques proposed in the existing approaches. Second, the inadequacies and challenges in malware behavior analysis are summarized from different perspectives. Finally, several possible directions are discussed for future research.
Citations
More filters
Journal ArticleDOI
TL;DR: A dynamic analysis for IoT malware detection (DAIMD) is proposed to reduce damage to IoT devices by detecting both well-known IoT malware and new and variant IoT malware evolved intelligently.
Abstract: Internet of Things (IoT) technology provides the basic infrastructure for a hyper connected society where all things are connected and exchange information through the Internet. IoT technology is fused with 5G and artificial intelligence (AI) technologies for use various fields such as the smart city and smart factory. As the demand for IoT technology increases, security threats against IoT infrastructure, applications, and devices have also increased. A variety of studies have been conducted on the detection of IoT malware to avoid the threats posed by malicious code. While existing models may accurately detect malicious IoT code identified through static analysis, detecting the new and variant IoT malware quickly being generated may become challenging. This paper proposes a dynamic analysis for IoT malware detection (DAIMD) to reduce damage to IoT devices by detecting both well-known IoT malware and new and variant IoT malware evolved intelligently. The DAIMD scheme learns IoT malware using the convolution neural network (CNN) model and analyzes IoT malware dynamically in nested cloud environment. DAIMD performs dynamic analysis on IoT malware in a nested cloud environment to extract behaviors related to memory, network, virtual file system, process, and system call. By converting the extracted and analyzed behavior data into images, the behavior images of IoT malware are classified and trained in the Convolution Neural Network (CNN). DAIMD can minimize the infection damage of IoT devices from malware by visualizing and learning the vast amount of behavior data generated through dynamic analysis.

81 citations


Cites background from "A survey of malware behavior descri..."

  • ...culty detecting obfuscated malware using packing and identifying the overall functions of malware, which are drawbacks [16], [17], [19], [21]–[23], [26], [27]....

    [...]

Journal ArticleDOI
TL;DR: A detailed meta-review of the existing surveys related to malware and its detection techniques, showing an arms race between these two sides of a barricade, is presented in this article.
Abstract: Cyber attacks are currently blooming, as the attackers reap significant profits from them and face a limited risk when compared to committing the “classical” crimes. One of the major components that leads to the successful compromising of the targeted system is malicious software. It allows using the victim’s machine for various nefarious purposes, e.g., making it a part of the botnet, mining cryptocurrencies, or holding hostage the data stored there. At present, the complexity, proliferation, and variety of malware pose a real challenge for the existing countermeasures and require their constant improvements. That is why, in this paper we first perform a detailed meta-review of the existing surveys related to malware and its detection techniques, showing an arms race between these two sides of a barricade. On this basis, we review the evolution of modern threats in the communication networks, with a particular focus on the techniques employing information hiding. Next, we present the bird’s eye view portraying the main development trends in detection methods with a special emphasis on the machine learning techniques. The survey is concluded with the description of potential future research directions in the field of malware detection.

63 citations

Journal ArticleDOI
TL;DR: Two novel techniques; incremental bagging (iBagging) and enhanced semi-random subspace selection (ESRS) are proposed and incorporates them into an ensemble-based detection model and achieved higher detection accuracy than existing solutions.

52 citations

Journal ArticleDOI
TL;DR: A Dynamic Pre-encryption Boundary Delineation and Feature Extraction (DPBD-FE) scheme that determines the boundary of the pre-enc encryption phase, from which the features are extracted and selected more accurately compared to related works is proposed.
Abstract: The cryptography employed against user files makes the effect of crypto-ransomware attacks irreversible even after detection and removal. Thus, detecting such attacks early, i.e. during pre-encryption phase before the encryption takes place is necessary. Existing crypto-ransomware early detection solutions use a fixed time-based thresholding approach to determine the pre-encryption phase boundaries. However, the fixed time thresholding approach implies that all samples start the encryption at the same time. Such assumption does not necessarily hold for all samples as the time for the main sabotage to start varies among different crypto-ransomware families due to the obfuscation techniques employed by the malware to change its attack strategies and evade detection, which generates different attack behaviors. Additionally, the lack of sufficient data at the early phases of the attack adversely affects the ability of feature extraction techniques in early detection models to perceive the characteristics of the attacks, which, consequently, decreases the detection accuracy. Therefore, this paper proposes a Dynamic Pre-encryption Boundary Delineation and Feature Extraction (DPBD-FE) scheme that determines the boundary of the pre-encryption phase, from which the features are extracted and selected more accurately. Unlike the fixed thresholding employed by the extant works, DPBD-FE tracks the pre-encryption phase for each instance individually based on the first occurrence of any cryptography-related APIs. Then, an annotated Term Frequency-Inverse Document Frequency (aTF-IDF) technique was utilized to extract the features from runtime data generated during the pre-encryption phase of crypto-ransomware attacks. The aTF-IDF overcomes the challenge of insufficient attack patterns during the early phases of the attack lifecycle. The experimental evaluation shows that DPBD-FE was able to determine the pre-encryption boundaries and extract the features related to this phase more accurately compared to related works.

25 citations


Cites methods from "A survey of malware behavior descri..."

  • ...Unlike traditional TF-IDF used by extant cryptoransomware like [2], [30], [42]–[46], the annotated TF-IDF (aTF-IDF) distinguishes the APIs that are called during the pre-encryption phase from those who are called during and after the encryption....

    [...]

Book ChapterDOI
08 Nov 2018
TL;DR: The main objective is to find more discriminative dynamic features to detect malware executables by analyzing different dynamic features with common malware detection approaches by evaluating some dynamic feature-based malware detection and classification approaches.
Abstract: While increasing the threat of malware for information systems, researchers strive to find alternative malware detection methods based on static, dynamic and hybrid analysis. Due to obfuscation techniques to bypass the static analysis, dynamic methods become more useful to detect malware. Therefore, most of the researches focus on dynamic behavior analysis of malicious software. In this work, our main objective is to find more discriminative dynamic features to detect malware executables by analyzing different dynamic features with common malware detection approaches. Moreover, we analyze separately different features obtained in dynamic analysis, such as API-call, usage system library and operations, to observe the contributions of these features to malware detection and classification success. For this purpose, we evaluate the performance of some dynamic feature-based malware detection and classification approaches using four data sets that contain real and synthetic malware executables.

23 citations

References
More filters
Proceedings ArticleDOI
20 May 2012
TL;DR: Systematize or characterize existing Android malware from various aspects, including their installation methods, activation mechanisms as well as the nature of carried malicious payloads reveal that they are evolving rapidly to circumvent the detection from existing mobile anti-virus software.
Abstract: The popularity and adoption of smart phones has greatly stimulated the spread of mobile malware, especially on the popular platforms such as Android. In light of their rapid growth, there is a pressing need to develop effective solutions. However, our defense capability is largely constrained by the limited understanding of these emerging mobile malware and the lack of timely access to related samples. In this paper, we focus on the Android platform and aim to systematize or characterize existing Android malware. Particularly, with more than one year effort, we have managed to collect more than 1,200 malware samples that cover the majority of existing Android malware families, ranging from their debut in August 2010 to recent ones in October 2011. In addition, we systematically characterize them from various aspects, including their installation methods, activation mechanisms as well as the nature of carried malicious payloads. The characterization and a subsequent evolution-based study of representative families reveal that they are evolving rapidly to circumvent the detection from existing mobile anti-virus software. Based on the evaluation with four representative mobile security software, our experiments show that the best case detects 79.6% of them while the worst case detects only 20.2% in our dataset. These results clearly call for the need to better develop next-generation anti-mobile-malware solutions.

2,122 citations

Proceedings ArticleDOI
01 Jan 2014
TL;DR: DREBIN is proposed, a lightweight method for detection of Android malware that enables identifying malicious applications directly on the smartphone and outperforms several related approaches and detects 94% of the malware with few false alarms.
Abstract: Malicious applications pose a threat to the security of the Android platform. The growing amount and diversity of these applications render conventional defenses largely ineffective and thus Android smartphones often remain unprotected from novel malware. In this paper, we propose DREBIN, a lightweight method for detection of Android malware that enables identifying malicious applications directly on the smartphone. As the limited resources impede monitoring applications at run-time, DREBIN performs a broad static analysis, gathering as many features of an application as possible. These features are embedded in a joint vector space, such that typical patterns indicative for malware can be automatically identified and used for explaining the decisions of our method. In an evaluation with 123,453 applications and 5,560 malware samples DREBIN outperforms several related approaches and detects 94% of the malware with few false alarms, where the explanations provided for each detection reveal relevant properties of the detected malware. On five popular smartphones, the method requires 10 seconds for an analysis on average, rendering it suitable for checking downloaded applications directly on the device.

1,905 citations

Proceedings ArticleDOI
21 Oct 2011
TL;DR: In this article, the authors discuss an emerging field of study: adversarial machine learning (AML), the study of effective machine learning techniques against an adversarial opponent, and give a taxonomy for classifying attacks against online machine learning algorithms.
Abstract: In this paper (expanded from an invited talk at AISEC 2010), we discuss an emerging field of study: adversarial machine learning---the study of effective machine learning techniques against an adversarial opponent. In this paper, we: give a taxonomy for classifying attacks against online machine learning algorithms; discuss application-specific factors that limit an adversary's capabilities; introduce two models for modeling an adversary's capabilities; explore the limits of an adversary's knowledge about the algorithm, feature space, training, and input data; explore vulnerabilities in machine learning algorithms; discuss countermeasures against attacks; introduce the evasion challenge; and discuss privacy-preserving learning techniques.

947 citations

Proceedings ArticleDOI
27 Oct 2008
TL;DR: Ether, a transparent and external approach to malware analysis, is proposed, which is motivated by the intuition that for a malware analyzer to be transparent, it must not induce any side-effects that are unconditionally detectable by malware.
Abstract: Malware has become the centerpiece of most security threats on the Internet. Malware analysis is an essential technology that extracts the runtime behavior of malware, and supplies signatures to detection systems and provides evidence for recovery and cleanup. The focal point in the malware analysis battle is how to detect versus how to hide a malware analyzer from malware during runtime. State-of-the-art analyzers reside in or emulate part of the guest operating system and its underlying hardware, making them easy to detect and evade. In this paper, we propose a transparent and external approach to malware analysis, which is motivated by the intuition that for a malware analyzer to be transparent, it must not induce any side-effects that are unconditionally detectable by malware. Our analyzer, Ether, is based on a novel application of hardware virtualization extensions such as Intel VT, and resides completely outside of the target OS environment. Thus, there are no in-guest software components vulnerable to detection, and there are no shortcomings that arise from incomplete or inaccurate systememulation. Our experiments are based on our study of obfuscation techniques used to create 25,000 recent malware samples. The results show that Ether remains transparent and defeats the obfuscation tools that evade existing approaches.

756 citations

Journal ArticleDOI
TL;DR: The author briefly introduces the emerging field of adversarial machine learning, in which opponents can cause traditional machine learning algorithms to behave poorly in security applications.
Abstract: The author briefly introduces the emerging field of adversarial machine learning, in which opponents can cause traditional machine learning algorithms to behave poorly in security applications. He gives a high-level overview and mentions several types of attacks, as well as several types of defenses, and theoretical limits derived from a study of near-optimal evasion.

703 citations