An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

doi:10.1109/TSE.2016.2584050

Journal ArticleDOI

An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

Chakkrit Tantithamthavorn, +3 more

- 01 Jan 2017 -

IEEE Transactions on Software Engineerin...

- Vol. 43, Iss: 1, pp 1-18

TLDR

It is found that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques, and out-of-sample bootstrap validation yields the best balance between the bias and variance.

Abstract:

Defect prediction models help software quality assurance teams to allocate their limited resources to the most defect-prone modules. Model validation techniques, such as $k$ -fold cross-validation, use historical data to estimate how well a model will perform in the future. However, little is known about how accurate the estimates of model validation techniques tend to be. In this paper, we investigate the bias and variance of model validation techniques in the domain of defect prediction. Analysis of 101 public defect datasets suggests that 77 percent of them are highly susceptible to producing unstable results– - selecting an appropriate model validation technique is a critical experimental design choice. Based on an analysis of 256 studies in the defect prediction literature, we select the 12 most commonly adopted model validation techniques for evaluation. Through a case study of 18 systems, we find that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques. On the other hand, out-of-sample bootstrap validation yields the best balance between the bias and variance of estimates in the context of our study. Therefore, we recommend that future defect prediction studies avoid single-repetition holdout validation, and instead, use out-of-sample bootstrap validation.

An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

Citations

A novel neural source code representation based on abstract syntax tree

Automated parameter optimization of classification techniques for defect prediction models

The Impact of Automated Parameter Optimization on Defect Prediction Models

Deep Semantic Feature Learning for Software Defect Prediction

What do developers search for on the web

References

R: A language and environment for statistical computing.

Statistical Power Analysis for the Behavioral Sciences

Random Forests

A power primer.

An introduction to the bootstrap

Related Papers (5)

Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

A Systematic Literature Review on Fault Prediction Performance in Software Engineering

Data Mining Static Code Attributes to Learn Defect Predictors

On the relative value of cross-company and within-company data for defect prediction

Random Forests

Trending Questions (1)