DeepHunter: a coverage-guided fuzz testing framework for deep neural networks

doi:10.1145/3293882.3330579

Proceedings ArticleDOI

DeepHunter: a coverage-guided fuzz testing framework for deep neural networks

Xiaofei Xie, +9 more

- pp 146-157

Chats0

TLDR

DeepHunter, a coverage-guided fuzz testing framework for detecting potential defects of general-purpose DNNs, is proposed and a metamorphic mutation strategy to generate new semantically preserved tests is proposed, and multiple extensible coverage criteria as feedback to guide the test generation.

Abstract:

The past decade has seen the great potential of applying deep neural network (DNN) based software to safety-critical scenarios, such as autonomous driving. Similar to traditional software, DNNs could exhibit incorrect behaviors, caused by hidden defects, leading to severe accidents and losses. In this paper, we propose DeepHunter, a coverage-guided fuzz testing framework for detecting potential defects of general-purpose DNNs. To this end, we first propose a metamorphic mutation strategy to generate new semantically preserved tests, and leverage multiple extensible coverage criteria as feedback to guide the test generation. We further propose a seed selection strategy that combines both diversity-based and recency-based seed selection. We implement and incorporate 5 existing testing criteria and 4 seed selection strategies in DeepHunter. Large-scale experiments demonstrate that (1) our metamorphic mutation strategy is useful to generate new valid tests with the same semantics as the original seed, by up to a 98% validity ratio; (2) the diversity-based seed selection generally weighs more than recency-based seed selection in boosting the coverage and in detecting defects; (3) DeepHunter outperforms the state of the arts by coverage as well as the quantity and diversity of defects identified; (4) guided by corner-region based criteria, DeepHunter is useful to capture defects during the DNN quantization for platform migration.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

DeepStellar: model-based quantitative analysis of stateful deep learning systems

Xiaoning Du, +5 more

TL;DR: This paper model RNN as an abstract state transition system to characterize its internal behaviors and designs two trace similarity metrics and five coverage criteria which enable the quantitative analysis of RNNs, which are evaluated on four RNN-based systems covering image classification and automated speech recognition.

...read moreread less

Journal ArticleDOI

Testing machine learning based systems: a systematic mapping

Vincenzo Riccio, +5 more

- 01 Nov 2020 -

Empirical Software Engineering

TL;DR: A systematic mapping study about testing techniques for MLSs driven by 33 research questions and investigated multiple aspects of the testing approaches, such as the used/proposed adequacy criteria, the algorithms for test input generation, and the test oracles.

...read moreread less

Proceedings ArticleDOI

An Empirical Study of Common Challenges in Developing Deep Learning Applications

Tianyi Zhang, +4 more

TL;DR: A large-scale empirical study of deep learning questions in a popular Q&A website, Stack Overflow, finds that program crashes, model migration, and implementation questions are the top three most frequently asked questions.

...read moreread less

Proceedings ArticleDOI

Wuji: automatic online combat game testing using evolutionary deep reinforcement learning

Yan Zheng, +9 more

TL;DR: Wuji is proposed, an on-the-fly game testing framework, which leverages evolutionary algorithms, DRL and multi-objective optimization to perform automatic game testing and demonstrates the effectiveness of Wuji in exploring space and detecting bugs.

...read moreread less

Proceedings ArticleDOI

Deep learning library testing via effective model generation

Zan Wang, +4 more

TL;DR: This work designs a series of mutation rules for DL models, with the purpose of exploring different invoking sequences of library code and hard-to-trigger behaviors, and proposes a heuristic strategy to guide the model generation process towards the direction of amplifying the inconsistent degrees of the inconsistencies between different DL libraries caused by bugs.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Collapse

DeepHunter: a coverage-guided fuzz testing framework for deep neural networks

Citations

DeepStellar: model-based quantitative analysis of stateful deep learning systems

Testing machine learning based systems: a systematic mapping

An Empirical Study of Common Challenges in Developing Deep Learning Applications

Wuji: automatic online combat game testing using evolutionary deep reinforcement learning

Deep learning library testing via effective model generation

References

Gradient-based learning applied to document recognition

ImageNet Large Scale Visual Recognition Challenge

Human-level control through deep reinforcement learning

Mastering the game of Go with deep neural networks and tree search

TensorFlow: a system for large-scale machine learning

Related Papers (5)

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

DeepTest: automated testing of deep-neural-network-driven autonomous cars

DeepGauge: multi-granularity testing criteria for deep learning systems

Explaining and Harnessing Adversarial Examples

Towards Evaluating the Robustness of Neural Networks