scispace - formally typeset
Proceedings ArticleDOI

Where the bugs are

Thomas J. Ostrand, +2 more
- Vol. 29, Iss: 4, pp 86-96
TLDR
A negative binomial regression model using information from previous releases has been developed and used to predict the numbers of faults for a large industrial inventory system, and was extremely accurate.
Abstract
The ability to predict which files in a large software system are most likely to contain the largest numbers of faults in the next release can be a very valuable asset. To accomplish this, a negative binomial regression model using information from previous releases has been developed and used to predict the numbers of faults for a large industrial inventory system. The files of each release were sorted in descending order based on the predicted number of faults and then the first 20% of the files were selected. This was done for each of fifteen consecutive releases, representing more than four years of field usage. The predictions were extremely accurate, correctly selecting files that contained between 71% and 92% of the faults, with the overall average being 83%. In addition, the same model was used on data for the same system's releases, but with all fault data prior to integration testing removed. The prediction was again very accurate, ranging from 71% to 93%, with the average being 84%. Predictions were made for a second system, and again the first 20% of files accounted for 83% of the identified faults. Finally, a highly simplified predictor was considered which correctly predicted 73% and 74% of the faults for the two systems.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Systematic Literature Review on Fault Prediction Performance in Software Engineering

TL;DR: Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.
Proceedings ArticleDOI

Use of relative code churn measures to predict system defect density

TL;DR: A technique for early prediction of system defect density using a set of relative code churn measures that relate the amount of churn to other variables such as component size and the temporal extent of churn, which shows that while absolute measures of code chum are poor predictors of defect density, these measures are highly predictive of defectdensity.
Proceedings ArticleDOI

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

TL;DR: This paper studied cross-project defect prediction models on a large scale and identified factors that do influence the success of cross- project predictions, and derived decision trees that can provide early estimates for precision, recall, and accuracy before a prediction is attempted.
Journal ArticleDOI

Classifying Software Changes: Clean or Buggy?

TL;DR: A description of the change classification approach, techniques for extracting features from the source code and change histories, a characterization of the performance of change classification across 12 open source projects, and an evaluation of the predictive power of different groups of features.
Journal ArticleDOI

Evaluating defect prediction approaches: a benchmark and an extensive comparison

TL;DR: The results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.
References
More filters
Book

Generalized Linear Models

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Book

A complexity measure

TL;DR: In this paper, a graph-theoretic complexity measure for managing and controlling program complexity is presented. But the complexity is independent of physical size, and complexity depends only on the decision structure of a program.
Journal ArticleDOI

A Complexity Measure

TL;DR: Several properties of the graph-theoretic complexity are proved which show, for example, that complexity is independent of physical size and complexity depends only on the decision structure of a program.
Journal ArticleDOI

Predicting fault incidence using software change history

TL;DR: This paper uses change management data from a very large, long-lived software system to explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults.
Related Papers (5)