Where the bugs are

doi:10.1145/1007512.1007524

Proceedings ArticleDOI

Where the bugs are

- Vol. 29, Iss: 4, pp 86-96

TLDR

A negative binomial regression model using information from previous releases has been developed and used to predict the numbers of faults for a large industrial inventory system, and was extremely accurate.

Abstract:

The ability to predict which files in a large software system are most likely to contain the largest numbers of faults in the next release can be a very valuable asset. To accomplish this, a negative binomial regression model using information from previous releases has been developed and used to predict the numbers of faults for a large industrial inventory system. The files of each release were sorted in descending order based on the predicted number of faults and then the first 20% of the files were selected. This was done for each of fifteen consecutive releases, representing more than four years of field usage. The predictions were extremely accurate, correctly selecting files that contained between 71% and 92% of the faults, with the overall average being 83%. In addition, the same model was used on data for the same system's releases, but with all fault data prior to integration testing removed. The prediction was again very accurate, ranging from 71% to 93%, with the average being 84%. Predictions were made for a second system, and again the first 20% of files accounted for 83% of the identified faults. Finally, a highly simplified predictor was considered which correctly predicted 73% and 74% of the faults for the two systems.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Systematic Literature Review on Fault Prediction Performance in Software Engineering

Tracy Hall, +4 more

- 01 Nov 2012 -

IEEE Transactions on Software Engineerin...

TL;DR: Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.

...read moreread less

Proceedings ArticleDOI

Use of relative code churn measures to predict system defect density

Nachiappan Nagappan, +1 more

TL;DR: A technique for early prediction of system defect density using a set of relative code churn measures that relate the amount of churn to other variables such as component size and the temporal extent of churn, which shows that while absolute measures of code chum are poor predictors of defect density, these measures are highly predictive of defectdensity.

...read moreread less

Proceedings ArticleDOI

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Thomas Zimmermann, +4 more

TL;DR: This paper studied cross-project defect prediction models on a large scale and identified factors that do influence the success of cross- project predictions, and derived decision trees that can provide early estimates for precision, recall, and accuracy before a prediction is attempted.

...read moreread less

Journal ArticleDOI

Classifying Software Changes: Clean or Buggy?

Sunghun Kim, +2 more

- 01 Mar 2008 -

IEEE Transactions on Software Engineerin...

TL;DR: A description of the change classification approach, techniques for extracting features from the source code and change histories, a characterization of the performance of change classification across 12 open source projects, and an evaluation of the predictive power of different groups of features.

...read moreread less

Journal ArticleDOI

Evaluating defect prediction approaches: a benchmark and an extensive comparison

Marco D'Ambros, +2 more

- 01 Aug 2012 -

Empirical Software Engineering

TL;DR: The results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Generalized Linear Models

Peter McCullagh, +1 more

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).

...read moreread less

Book

Generalized Linear Models

John H. Schuenemeyer, +2 more

Book

A complexity measure

Thomas J. McCabe

TL;DR: In this paper, a graph-theoretic complexity measure for managing and controlling program complexity is presented. But the complexity is independent of physical size, and complexity depends only on the decision structure of a program.

...read moreread less

Journal ArticleDOI

A Complexity Measure

Thomas J. McCabe

- 01 Jul 1976 -

IEEE Transactions on Software Engineerin...

TL;DR: Several properties of the graph-theoretic complexity are proved which show, for example, that complexity is independent of physical size and complexity depends only on the decision structure of a program.

...read moreread less

Journal ArticleDOI

Predicting fault incidence using software change history

T.L. Graves, +3 more

- 01 Jul 2000 -

IEEE Transactions on Software Engineerin...

TL;DR: This paper uses change management data from a very large, long-lived software system to explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults.

...read moreread less