A Systematic Literature Review on Fault Prediction Performance in Software Engineering

doi:10.1109/TSE.2011.103

Open AccessJournal ArticleDOI

A Systematic Literature Review on Fault Prediction Performance in Software Engineering

Tracy Hall, +4 more

- 01 Nov 2012 -

IEEE Transactions on Software Engineerin...

- Vol. 38, Iss: 6, pp 1276-1304

TLDR

Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.

Abstract:

Background: The accurate prediction of where faults are likely to occur in code can help direct test effort, reduce costs, and improve the quality of software. Objective: We investigate how the context of models, the independent variables used, and the modeling techniques applied influence the performance of fault prediction models. Method: We used a systematic literature review to identify 208 fault prediction studies published from January 2000 to December 2010. We synthesize the quantitative and qualitative results of 36 studies which report sufficient contextual and methodological information according to the criteria we develop and apply. Results: The models that perform well tend to be based on simple modeling techniques such as Naive Bayes or Logistic Regression. Combinations of independent variables have been used by models that perform well. Feature selection has been applied to these combinations when models are performing particularly well. Conclusion: The methodology used to build models seems to be influential to predictive performance. Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A large-scale empirical study of just-in-time quality assurance

Yasutaka Kamei, +6 more

- 01 Jun 2013 -

IEEE Transactions on Software Engineerin...

TL;DR: The findings indicate that “Just-In-Time Quality Assurance” may provide an effort-reducing way to focus on the most risky changes and thus reduce the costs of developing high-quality software.

...read moreread less

Journal ArticleDOI

Using Class Imbalance Learning for Software Defect Prediction

Shuo Wang, +1 more

- 29 Apr 2013 -

IEEE Transactions on Reliability

TL;DR: This paper investigates different types of class imbalance learning methods, including resampling techniques, threshold moving, and ensemble algorithms, and concludes that AdaBoost.NC shows the best overall performance in terms of the measures including balance, G-mean, and Area Under the Curve (AUC).

...read moreread less

Journal ArticleDOI

Data Quality: Some Comments on the NASA Software Defect Datasets

Martin Shepperd, +3 more

- 01 Sep 2013 -

IEEE Transactions on Software Engineerin...

TL;DR: The extent to which published analyses based on the NASA defect datasets are meaningful and comparable is investigated and it is recommended that researchers indicate the provenance of the datasets they use and invest effort in understanding the data prior to applying machine learners.

...read moreread less

Journal ArticleDOI

Software fault prediction metrics

Danijel Radjenović, +3 more

- 01 Aug 2013 -

Information & Software Technology

TL;DR: Object-oriented and process metrics have been reported to be more successful in finding faults compared to traditional size and complexity metrics and seem to be better at predicting post-release faults than any static code metrics.

...read moreread less

Journal ArticleDOI

An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

Chakkrit Tantithamthavorn, +3 more

- 01 Jan 2017 -

IEEE Transactions on Software Engineerin...

TL;DR: It is found that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques, and out-of-sample bootstrap validation yields the best balance between the bias and variance.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

Nello Cristianini, +1 more

TL;DR: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory, and will guide practitioners to updated literature, new applications, and on-line software.

...read moreread less

A Practical Guide to Support Vector Classication

Hsu Chih-Wei, +2 more

TL;DR: A simple procedure is proposed, which usually gives reasonable results and is suitable for beginners who are not familiar with SVM.

...read moreread less

Journal ArticleDOI

Learning from Imbalanced Data

Haibo He, +1 more

- 01 Sep 2009 -

IEEE Transactions on Knowledge and Data ...

TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.

...read moreread less

Proceedings ArticleDOI

The relationship between Precision-Recall and ROC curves

Jesse Davis, +1 more

TL;DR: It is shown that a deep connection exists between ROC space and PR space, such that a curve dominates in R OC space if and only if it dominates in PR space.

...read moreread less

Journal ArticleDOI

A study of the behavior of several methods for balancing machine learning training data

Gustavo E. A. P. A. Batista, +2 more

- 01 Jun 2004 -

Sigkdd Explorations

TL;DR: This work performs a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets, and shows that, in general, over-sampling methods provide more accurate results than under-sampled methods considering the area under the ROC curve (AUC).

...read moreread less