scispace - formally typeset
Search or ask a question
Author

Yong Tang

Bio: Yong Tang is an academic researcher from National University of Defense Technology. The author has contributed to research in topics: Fuzz testing & Binary code. The author has an hindex of 5, co-authored 10 publications receiving 78 citations.

Papers
More filters
Proceedings Article
01 Jan 2020
TL;DR: A variant of the Adversarial Multi-Armed Bandit model for modeling AFL’s power schedule process is presented and a unique adaptive scheduling algorithm as well as a probability-based search strategy are developed.
Abstract: Fuzzing is one of the most effective approaches for identifying security vulnerabilities. As a state-of-the-art coverage-based greybox fuzzer, AFL is a highly effective and widely used technique. However, AFL allocates excessive energy (i.e., the number of test cases generated by the seed) to seeds that exercise the high-frequency paths and can not adaptively adjust the energy allocation, thus wasting a significant amount of energy. Moreover, the current Markov model for modeling coverage-based greybox fuzzing is not profound enough. This paper presents a variant of the Adversarial Multi-Armed Bandit model for modeling AFL’s power schedule process. We first explain the challenges in AFL’s scheduling algorithm by using the reward probability that generates a test case for discovering a new path. Moreover, we illustrated the three states of the seeds set and developed a unique adaptive scheduling algorithm as well as a probability-based search strategy. These approaches are implemented on top of AFL in an adaptive energy-saving greybox fuzzer called EcoFuzz. EcoFuzz is examined against other six AFL-type tools on 14 real-world subjects over 490 CPU days. According to the results, EcoFuzz could attain 214% of the path coverage of AFL with reducing 32% test cases generation of that of AFL. Besides, EcoFuzz identified 12 vulnerabilities in GNU Binutils and other software. We also extended EcoFuzz to test some IoT devices and found a new vulnerability in the SNMP component.

66 citations

Journal ArticleDOI
Bo Yu1, Fang Ying1, Qiang Yang1, Yong Tang1, Liu Liu1 
TL;DR: This paper conducts a survey on malware behavior description and analysis considering three aspects: malware behavior described, behavior analysis methods, and visualization techniques.
Abstract: Behavior-based malware analysis is an important technique for automatically analyzing and detecting malware, and it has received considerable attention from both academic and industrial communities. By considering how malware behaves, we can tackle the malware obfuscation problem, which cannot be processed by traditional static analysis approaches, and we can also derive the as-built behavior specifications and cover the entire behavior space of the malware samples. Although there have been several works focusing on malware behavior analysis, such research is far from mature, and no overviews have been put forward to date to investigate current developments and challenges. In this paper, we conduct a survey on malware behavior description and analysis considering three aspects: malware behavior description, behavior analysis methods, and visualization techniques. First, existing behavior data types and emerging techniques for malware behavior description are explored, especially the goals, principles, characteristics, and classifications of behavior analysis techniques proposed in the existing approaches. Second, the inadequacies and challenges in malware behavior analysis are summarized from different perspectives. Finally, several possible directions are discussed for future research.

34 citations

Proceedings ArticleDOI
06 Nov 2019
TL;DR: IoTHunter is presented, the first grey-box fuzzer for fuzzing stateful protocols in IoT firmware, which addresses the state scheduling problem based on a multi-stage message generation mechanism on runtime monitoring of IoT firmware.
Abstract: In this work, we present IoTHunter, the first grey-box fuzzer for fuzzing stateful protocols in IoT firmware. IoTHunter addresses the state scheduling problem based on a multi-stage message generation mechanism on runtime monitoring of IoT firmware. We evaluate IoTHunter with a set of real-world programs, and the result shows that IoTHunter outperforms black-box fuzzer boofuzz, which has a 2.2x, 2.0x, and 2.5x increase for function coverage, block coverage, and edge coverage, respectively. IoTHunter also found five new vulnerabilities in the firmware of home router Mikrotik, which have been reported to the vendor.

23 citations

Journal ArticleDOI
TL;DR: This paper presents a knowledge-learn evolutionary fuzzer based on AFL, which is called LearnAFL, which can learn partial format knowledge of some paths by analyzing the test cases that exercise the paths and uses these format information to mutate the seeds.
Abstract: Mutation-based greybox fuzzing is a highly effective and widely used technique to find bugs in software. Provided initial seeds, fuzzers continuously generate test cases to test the software by mutating a seed input. However, the majority of them are “invalid” because the mutation may destroy the format of the seeds. In this paper, we present a knowledge-learn evolutionary fuzzer based on AFL, which is called LearnAFL. LearnAFL does not require any prior knowledge of the application or input format. Based on our format generation theory, LearnAFL can learn partial format knowledge of some paths by analyzing the test cases that exercise the paths. Then LearnAFL uses these format information to mutate the seeds, which is efficient to explore deeper paths and reduce the test cases exercising high-frequency paths than AFL. We compared LearnAFL with AFL and some other state-of-the-art fuzzers on ten real-world programs. The result showed that LearnAFL could reach branch coverage 120% and 110% of that of AFL and FairFuzz, respectively. LearnAFL also found 8 unknown vulnerabilities in GNU Binutils, Libpng and Gif2png, all of which have been reported to the vendors. Besides, we compared the format information learned from the initial seed of an ELF file with a format standard of ELF files. The result showed that LearnAFL learns about 64% part of the file format without any prior knowledge.

19 citations

Book ChapterDOI
03 Jul 2017
TL;DR: The experimental results demonstrate that the ensemble learning based dynamic malware classification approach can classify malware variants in high F1-score while imposing low classification time in datasets of different scales.
Abstract: Dynamic analysis plays an important role in analyzing malware variants which have used obfuscation, polymorphism and metamorphism techniques. Malware classification is an emerging approach for discriminating different malware families. However, existing malware classification methods have mediocre performance in small scale datasets and some machine learning algorithms have difficulties in handling imbalanced datasets. To solve these issues, we propose an ensemble learning based dynamic malware classification approach aiming at datasets of different scales. Additionally a novel feature selection method is presented to select features with strong discrimination power. In particular, we continue to explore issues in feature representation and feature selection. To verify the efficiency of our approach, we perform a series of comparative experiments with existing feature selection methods, commercial anti-malware tools and current malware classification techniques. The experimental results demonstrate that our approach can classify malware variants in high F1-score while imposing low classification time in datasets of different scales.

13 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper finds that an ensemble of recurrent neural networks are able to predict whether an executable is malicious or benign within the first 5 seconds of execution with 94% accuracy, which enables cyber security endpoint protection to be advanced to use behavioural data for blocking malicious payloads rather than detecting them post-execution and having to repair the damage.

205 citations

Journal ArticleDOI
TL;DR: A dynamic analysis for IoT malware detection (DAIMD) is proposed to reduce damage to IoT devices by detecting both well-known IoT malware and new and variant IoT malware evolved intelligently.
Abstract: Internet of Things (IoT) technology provides the basic infrastructure for a hyper connected society where all things are connected and exchange information through the Internet. IoT technology is fused with 5G and artificial intelligence (AI) technologies for use various fields such as the smart city and smart factory. As the demand for IoT technology increases, security threats against IoT infrastructure, applications, and devices have also increased. A variety of studies have been conducted on the detection of IoT malware to avoid the threats posed by malicious code. While existing models may accurately detect malicious IoT code identified through static analysis, detecting the new and variant IoT malware quickly being generated may become challenging. This paper proposes a dynamic analysis for IoT malware detection (DAIMD) to reduce damage to IoT devices by detecting both well-known IoT malware and new and variant IoT malware evolved intelligently. The DAIMD scheme learns IoT malware using the convolution neural network (CNN) model and analyzes IoT malware dynamically in nested cloud environment. DAIMD performs dynamic analysis on IoT malware in a nested cloud environment to extract behaviors related to memory, network, virtual file system, process, and system call. By converting the extracted and analyzed behavior data into images, the behavior images of IoT malware are classified and trained in the Convolution Neural Network (CNN). DAIMD can minimize the infection damage of IoT devices from malware by visualizing and learning the vast amount of behavior data generated through dynamic analysis.

81 citations

Proceedings Article
01 Jan 2020
TL;DR: A variant of the Adversarial Multi-Armed Bandit model for modeling AFL’s power schedule process is presented and a unique adaptive scheduling algorithm as well as a probability-based search strategy are developed.
Abstract: Fuzzing is one of the most effective approaches for identifying security vulnerabilities. As a state-of-the-art coverage-based greybox fuzzer, AFL is a highly effective and widely used technique. However, AFL allocates excessive energy (i.e., the number of test cases generated by the seed) to seeds that exercise the high-frequency paths and can not adaptively adjust the energy allocation, thus wasting a significant amount of energy. Moreover, the current Markov model for modeling coverage-based greybox fuzzing is not profound enough. This paper presents a variant of the Adversarial Multi-Armed Bandit model for modeling AFL’s power schedule process. We first explain the challenges in AFL’s scheduling algorithm by using the reward probability that generates a test case for discovering a new path. Moreover, we illustrated the three states of the seeds set and developed a unique adaptive scheduling algorithm as well as a probability-based search strategy. These approaches are implemented on top of AFL in an adaptive energy-saving greybox fuzzer called EcoFuzz. EcoFuzz is examined against other six AFL-type tools on 14 real-world subjects over 490 CPU days. According to the results, EcoFuzz could attain 214% of the path coverage of AFL with reducing 32% test cases generation of that of AFL. Besides, EcoFuzz identified 12 vulnerabilities in GNU Binutils and other software. We also extended EcoFuzz to test some IoT devices and found a new vulnerability in the SNMP component.

66 citations

Journal ArticleDOI
TL;DR: A detailed meta-review of the existing surveys related to malware and its detection techniques, showing an arms race between these two sides of a barricade, is presented in this article.
Abstract: Cyber attacks are currently blooming, as the attackers reap significant profits from them and face a limited risk when compared to committing the “classical” crimes. One of the major components that leads to the successful compromising of the targeted system is malicious software. It allows using the victim’s machine for various nefarious purposes, e.g., making it a part of the botnet, mining cryptocurrencies, or holding hostage the data stored there. At present, the complexity, proliferation, and variety of malware pose a real challenge for the existing countermeasures and require their constant improvements. That is why, in this paper we first perform a detailed meta-review of the existing surveys related to malware and its detection techniques, showing an arms race between these two sides of a barricade. On this basis, we review the evolution of modern threats in the communication networks, with a particular focus on the techniques employing information hiding. Next, we present the bird’s eye view portraying the main development trends in detection methods with a special emphasis on the machine learning techniques. The survey is concluded with the description of potential future research directions in the field of malware detection.

63 citations

Journal ArticleDOI
TL;DR: Two novel techniques; incremental bagging (iBagging) and enhanced semi-random subspace selection (ESRS) are proposed and incorporates them into an ensemble-based detection model and achieved higher detection accuracy than existing solutions.

52 citations