scispace - formally typeset
Open AccessJournal ArticleDOI

Data Mining Static Code Attributes to Learn Defect Predictors

Reads0
Chats0
TLDR
It is shown that static code attributes used to build defect predictors are much more important than which particular attributes are used, and contrary to prior pessimism, they are demonstrably useful and yield predictors with a mean probability of detection and mean false alarms rates.
Abstract
The value of using static code attributes to learn defect predictors has been widely debated. Prior work has explored issues like the merits of "McCabes versus Halstead versus lines of code counts" for generating defect predictors. We show here that such debates are irrelevant since how the attributes are used to build predictors is much more important than which particular attributes are used. Also, contrary to prior pessimism, we show that such defect predictors are demonstrably useful and, on the data studied here, yield predictors with a mean probability of detection of 71 percent and mean false alarms rates of 25 percent. These predictors would be useful for prioritizing a resource-bound exploration of code that has yet to be inspected

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

TL;DR: A framework for comparative software defect prediction experiments is proposed and applied in a large-scale empirical comparison of 22 classifiers over 10 public domain data sets from the NASA Metrics Data repository, showing an appealing degree of predictive accuracy, which supports the view that metric-based classification is useful.
Journal ArticleDOI

A Systematic Literature Review on Fault Prediction Performance in Software Engineering

TL;DR: Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.
Proceedings ArticleDOI

A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction

TL;DR: A comparative analysis of the predictive power of two different sets of metrics for defect prediction indicates that for the Eclipse data, process metrics are more efficient defect predictors than code metrics.
Proceedings ArticleDOI

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

TL;DR: This paper studied cross-project defect prediction models on a large scale and identified factors that do influence the success of cross- project predictions, and derived decision trees that can provide early estimates for precision, recall, and accuracy before a prediction is attempted.
Journal ArticleDOI

On the relative value of cross-company and within-company data for defect prediction

TL;DR: It is demonstrated in this paper that the minimum number of data samples required to build effective defect predictors can be quite small and can be collected quickly within a few months.
References
More filters
Book

Genetic algorithms in search, optimization, and machine learning

TL;DR: In this article, the authors present the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields, including computer programming and mathematics.
Journal ArticleDOI

Learning representations by back-propagating errors

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Book

C4.5: Programs for Machine Learning

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Book

Genetic Algorithms

Related Papers (5)