scispace - formally typeset
Journal ArticleDOI

An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

TLDR
It is found that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques, and out-of-sample bootstrap validation yields the best balance between the bias and variance.
Abstract
Defect prediction models help software quality assurance teams to allocate their limited resources to the most defect-prone modules. Model validation techniques, such as $k$ -fold cross-validation, use historical data to estimate how well a model will perform in the future. However, little is known about how accurate the estimates of model validation techniques tend to be. In this paper, we investigate the bias and variance of model validation techniques in the domain of defect prediction. Analysis of 101 public defect datasets suggests that 77 percent of them are highly susceptible to producing unstable results– - selecting an appropriate model validation technique is a critical experimental design choice. Based on an analysis of 256 studies in the defect prediction literature, we select the 12 most commonly adopted model validation techniques for evaluation. Through a case study of 18 systems, we find that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques. On the other hand, out-of-sample bootstrap validation yields the best balance between the bias and variance of estimates in the context of our study. Therefore, we recommend that future defect prediction studies avoid single-repetition holdout validation, and instead, use out-of-sample bootstrap validation.

read more

Citations
More filters
Proceedings ArticleDOI

A novel neural source code representation based on abstract syntax tree

TL;DR: This paper proposes a novel AST-based Neural Network (ASTNN) for source code representation that splits each large AST into a sequence of small statement trees, and encodes the statement trees to vectors by capturing the lexical and syntactical knowledge of statements.
Proceedings ArticleDOI

Automated parameter optimization of classification techniques for defect prediction models

TL;DR: This paper concludes that parameter settings can indeed have a large impact on the performance of defect prediction models, suggesting that researchers should experiment with the parameters of the classification techniques.
Journal ArticleDOI

The Impact of Automated Parameter Optimization on Defect Prediction Models

TL;DR: In this article, the authors study the impact of parameter optimization on defect prediction models and find that automated parameter optimization can substantially shift the importance ranking of variables, with as few as 28 percent of the top-ranked variables in optimized classifiers also being topranked in non-optimized classifiers.
Journal ArticleDOI

Deep Semantic Feature Learning for Software Defect Prediction

TL;DR: This work proposes leveraging a powerful representation-learning algorithm, deep learning, to learn the semantic representations of programs automatically from source code files and code changes and results indicate that the DBN-based semantic features can significantly improve the examined defect prediction tasks.
Journal ArticleDOI

What do developers search for on the web

TL;DR: Light is shed as to why practitioners often perform some of these search tasks and why they find some of them to be challenging, and the implications of the findings to future research in several research areas are discussed.
References
More filters
Journal Article

R: A language and environment for statistical computing.

R Core Team
- 01 Jan 2014 - 
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Book

Statistical Power Analysis for the Behavioral Sciences

TL;DR: The concepts of power analysis are discussed in this paper, where Chi-square Tests for Goodness of Fit and Contingency Tables, t-Test for Means, and Sign Test are used.
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

A power primer.

TL;DR: A convenient, although not comprehensive, presentation of required sample sizes is providedHere the sample sizes necessary for .80 power to detect effects at these levels are tabled for eight standard statistical tests.
Book

An introduction to the bootstrap

TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Related Papers (5)
Trending Questions (1)
What are the most effective methods for predicting defects in requirements engineering?

The most effective method for predicting defects in requirements engineering is out-of-sample bootstrap validation.