scispace - formally typeset
Search or ask a question
Author

Xiaofei Xie

Bio: Xiaofei Xie is an academic researcher from Nanyang Technological University. The author has contributed to research in topics: Computer science & Fuzz testing. The author has an hindex of 22, co-authored 107 publications receiving 1555 citations. Previous affiliations of Xiaofei Xie include Tianjin University & Kyushu University.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
10 Jul 2019
TL;DR: DeepHunter, a coverage-guided fuzz testing framework for detecting potential defects of general-purpose DNNs, is proposed and a metamorphic mutation strategy to generate new semantically preserved tests is proposed, and multiple extensible coverage criteria as feedback to guide the test generation.
Abstract: The past decade has seen the great potential of applying deep neural network (DNN) based software to safety-critical scenarios, such as autonomous driving. Similar to traditional software, DNNs could exhibit incorrect behaviors, caused by hidden defects, leading to severe accidents and losses. In this paper, we propose DeepHunter, a coverage-guided fuzz testing framework for detecting potential defects of general-purpose DNNs. To this end, we first propose a metamorphic mutation strategy to generate new semantically preserved tests, and leverage multiple extensible coverage criteria as feedback to guide the test generation. We further propose a seed selection strategy that combines both diversity-based and recency-based seed selection. We implement and incorporate 5 existing testing criteria and 4 seed selection strategies in DeepHunter. Large-scale experiments demonstrate that (1) our metamorphic mutation strategy is useful to generate new valid tests with the same semantics as the original seed, by up to a 98% validity ratio; (2) the diversity-based seed selection generally weighs more than recency-based seed selection in boosting the coverage and in detecting defects; (3) DeepHunter outperforms the state of the arts by coverage as well as the quantity and diversity of defects identified; (4) guided by corner-region based criteria, DeepHunter is useful to capture defects during the DNN quantization for platform migration.

302 citations

Proceedings ArticleDOI
15 Oct 2018
TL;DR: Hawkeye is implemented as a fuzzing framework and evaluated it on various real-world programs under different scenarios, showing that Hawkeye can reach the target sites and reproduce the crashes much faster than state-of-the-art grey-box fuzzers such as AFL and AFLGo.
Abstract: Grey-box fuzzing is a practically effective approach to test real-world programs. However, most existing grey-box fuzzers lack directedness, i.e. the capability of executing towards user-specified target sites in the program. To emphasize existing challenges in directed fuzzing, we propose Hawkeye to feature four desired properties of directed grey-box fuzzers. Owing to a novel static analysis on the program under test and the target sites, Hawkeye precisely collects the information such as the call graph, function and basic block level distances to the targets. During fuzzing, Hawkeye evaluates exercised seeds based on both static information and the execution traces to generate the dynamic metrics, which are then used for seed prioritization, power scheduling and adaptive mutating. These strategies help Hawkeye to achieve better directedness and gravitate towards the target sites. We implemented Hawkeye as a fuzzing framework and evaluated it on various real-world programs under different scenarios. The experimental results showed that Hawkeye can reach the target sites and reproduce the crashes much faster than state-of-the-art grey-box fuzzers such as AFL and AFLGo. Specially, Hawkeye can reduce the time to exposure for certain vulnerabilities from about 3.5 hours to 0.5 hour. By now, Hawkeye has detected more than 41 previously unknown crashes in projects such as Oniguruma, MJS with the target sites provided by vulnerability prediction tools; all these crashes are confirmed and 15 of them have been assigned CVE IDs.

174 citations

Proceedings ArticleDOI
12 Aug 2019
TL;DR: This paper model RNN as an abstract state transition system to characterize its internal behaviors and designs two trace similarity metrics and five coverage criteria which enable the quantitative analysis of RNNs, which are evaluated on four RNN-based systems covering image classification and automated speech recognition.
Abstract: Deep Learning (DL) has achieved tremendous success in many cutting-edge applications. However, the state-of-the-art DL systems still suffer from quality issues. While some recent progress has been made on the analysis of feed-forward DL systems, little study has been done on the Recurrent Neural Network (RNN)-based stateful DL systems, which are widely used in audio, natural languages and video processing, etc. In this paper, we initiate the very first step towards the quantitative analysis of RNN-based DL systems. We model RNN as an abstract state transition system to characterize its internal behaviors. Based on the abstract model, we design two trace similarity metrics and five coverage criteria which enable the quantitative analysis of RNNs. We further propose two algorithms powered by the quantitative measures for adversarial sample detection and coverage-guided test generation. We evaluate DeepStellar on four RNN-based systems covering image classification and automated speech recognition. The results demonstrate that the abstract model is useful in capturing the internal behaviors of RNNs, and confirm that (1) the similarity metrics could effectively capture the differences between samples even with very small perturbations (achieving 97% accuracy for detecting adversarial samples) and (2) the coverage criteria are useful in revealing erroneous behaviors (generating three times more adversarial samples than random testing and hundreds times more than the unrolling approach).

148 citations

Proceedings ArticleDOI
09 Jul 2020
TL;DR: This work proposes a novel approach, named FakeSpotter, based on monitoring neuron behaviors to spot AI-synthesized fake faces, conjecture that monitoring neuron behavior can also serve as an asset in detecting fake faces since layer-by-layer neuron activation patterns may capture more subtle features that are important for the fake detector.
Abstract: In recent years, generative adversarial networks (GANs) and its variants have achieved unprecedented success in image synthesis. They are widely adopted in synthesizing facial images which brings potential security concerns to humans as the fakes spread and fuel the misinformation. However, robust detectors of these AI-synthesized fake faces are still in their infancy and are not ready to fully tackle this emerging challenge. In this work, we propose a novel approach, named FakeSpotter, based on monitoring neuron behaviors to spot AI-synthesized fake faces. The studies on neuron coverage and interactions have successfully shown that they can be served as testing criteria for deep learning systems, especially under the settings of being exposed to adversarial attacks. Here, we conjecture that monitoring neuron behavior can also serve as an asset in detecting fake faces since layer-by-layer neuron activation patterns may capture more subtle features that are important for the fake detector. Experimental results on detecting four types of fake faces synthesized with the state-of-the-art GANs and evading four perturbation attacks show the effectiveness and robustness of our approach.

147 citations

Proceedings ArticleDOI
10 Nov 2019
TL;DR: Wuji is proposed, an on-the-fly game testing framework, which leverages evolutionary algorithms, DRL and multi-objective optimization to perform automatic game testing and demonstrates the effectiveness of Wuji in exploring space and detecting bugs.
Abstract: Game testing has been long recognized as a notoriously challenging task, which mainly relies on manual playing and scripting based testing in game industry. Even until recently, automated game testing still remains to be largely untouched niche. A key challenge is that game testing often requires to play the game as a sequential decision process. A bug may only be triggered until completing certain difficult intermediate tasks, which requires a certain level of intelligence. The recent success of deep reinforcement learning (DRL) sheds light on advancing automated game testing, without human competitive intelligent support. However, the existing DRLs mostly focus on winning the game rather than game testing. To bridge the gap, in this paper, we first perform an in-depth analysis of 1349 real bugs from four real-world commercial game products. Based on this, we propose four oracles to support automated game testing, and further propose Wuji, an on-the-fly game testing framework, which leverages evolutionary algorithms, DRL and multi-objective optimization to perform automatic game testing. Wuji balances between winning the game and exploring the space of the game. Winning the game allows the agent to make progress in the game, while space exploration increases the possibility of discovering bugs. We conduct a large-scale evaluation on a simple game and two popular commercial games. The results demonstrate the effectiveness of Wuji in exploring space and detecting bugs. Moreover, Wuji found 3 previously unknown bugs1, which have been confirmed by the developers, in the commercial games.

104 citations


Cited by
More filters
Posted Content
TL;DR: This work redesigns the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images, and thereby redefines the state of the art in unconditional image modeling.
Abstract: The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.

2,411 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this paper, the authors propose to redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images.
Abstract: The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.

2,006 citations

Journal ArticleDOI
TL;DR: This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning (RL), and outlines a few important applications of UZ methods.
Abstract: Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes. It can be applied to solve a variety of real-world applications in science and engineering. Bayesian approximation and ensemble learning techniques are two most widely-used UQ methods in the literature. In this regard, researchers have proposed different UQ methods and examined their performance in a variety of applications such as computer vision (e.g., self-driving cars and object detection), image processing (e.g., image restoration), medical image analysis (e.g., medical image classification and segmentation), natural language processing (e.g., text classification, social media texts and recidivism risk-scoring), bioinformatics, etc. This study reviews recent advances in UQ methods used in deep learning. Moreover, we also investigate the application of these methods in reinforcement learning (RL). Then, we outline a few important applications of UQ methods. Finally, we briefly highlight the fundamental research challenges faced by UQ methods and discuss the future research directions in this field.

809 citations

Journal ArticleDOI
TL;DR: This survey provides a thorough review of techniques for manipulating face images including DeepFake methods, and methods to detect such manipulations, with special attention to the latest generation of DeepFakes.

502 citations