scispace - formally typeset
Open AccessJournal ArticleDOI

A Systematic Literature Review on Fault Prediction Performance in Software Engineering

TLDR
Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.
Abstract
Background: The accurate prediction of where faults are likely to occur in code can help direct test effort, reduce costs, and improve the quality of software. Objective: We investigate how the context of models, the independent variables used, and the modeling techniques applied influence the performance of fault prediction models. Method: We used a systematic literature review to identify 208 fault prediction studies published from January 2000 to December 2010. We synthesize the quantitative and qualitative results of 36 studies which report sufficient contextual and methodological information according to the criteria we develop and apply. Results: The models that perform well tend to be based on simple modeling techniques such as Naive Bayes or Logistic Regression. Combinations of independent variables have been used by models that perform well. Feature selection has been applied to these combinations when models are performing particularly well. Conclusion: The methodology used to build models seems to be influential to predictive performance. Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A large-scale empirical study of just-in-time quality assurance

TL;DR: The findings indicate that “Just-In-Time Quality Assurance” may provide an effort-reducing way to focus on the most risky changes and thus reduce the costs of developing high-quality software.
Journal ArticleDOI

Using Class Imbalance Learning for Software Defect Prediction

TL;DR: This paper investigates different types of class imbalance learning methods, including resampling techniques, threshold moving, and ensemble algorithms, and concludes that AdaBoost.NC shows the best overall performance in terms of the measures including balance, G-mean, and Area Under the Curve (AUC).
Journal ArticleDOI

Data Quality: Some Comments on the NASA Software Defect Datasets

TL;DR: The extent to which published analyses based on the NASA defect datasets are meaningful and comparable is investigated and it is recommended that researchers indicate the provenance of the datasets they use and invest effort in understanding the data prior to applying machine learners.
Journal ArticleDOI

Software fault prediction metrics

TL;DR: Object-oriented and process metrics have been reported to be more successful in finding faults compared to traditional size and complexity metrics and seem to be better at predicting post-release faults than any static code metrics.
Journal ArticleDOI

An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

TL;DR: It is found that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques, and out-of-sample bootstrap validation yields the best balance between the bias and variance.
References
More filters
Book

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

TL;DR: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory, and will guide practitioners to updated literature, new applications, and on-line software.

A Practical Guide to Support Vector Classication

TL;DR: A simple procedure is proposed, which usually gives reasonable results and is suitable for beginners who are not familiar with SVM.
Journal ArticleDOI

Learning from Imbalanced Data

TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.
Proceedings ArticleDOI

The relationship between Precision-Recall and ROC curves

TL;DR: It is shown that a deep connection exists between ROC space and PR space, such that a curve dominates in R OC space if and only if it dominates in PR space.
Journal ArticleDOI

A study of the behavior of several methods for balancing machine learning training data

TL;DR: This work performs a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets, and shows that, in general, over-sampling methods provide more accurate results than under-sampled methods considering the area under the ROC curve (AUC).
Related Papers (5)