Towards identifying software project clusters with regard to defect prediction
read more
Citations
An Empirical Comparison of Model Validation Techniques for Defect Prediction Models
Automated parameter optimization of classification techniques for defect prediction models
An investigation on the feasibility of cross-project defect prediction
The Impact of Automated Parameter Optimization on Defect Prediction Models
Software Defect Prediction via Convolutional Neural Network
References
A metrics suite for object oriented design
A complexity measure
A Complexity Measure
A critique of software defect prediction models
A hierarchical model for object-oriented design quality assessment
Related Papers (5)
Frequently Asked Questions (15)
Q2. What have the authors stated for future works in "Towards identifying software project clusters with regard to defect prediction" ?
Further research is necessary to identify more clusters. The clusters that were identified are very wide and therefore it is possible that those clusters may be successfully divided into smaller ones. There may be conducted a cross validation for the study.
Q3. What is the general assumption that should be checked in order to use a parametric test?
Following general assumptions should be checked in order to use a parametric test: level of measurement (the variables must be measured at the interval or ratio level scale), independence of observations, homogeneity of variance and the normal distribution of the sample.
Q4. How was the efficiency of predicting defects in a release of a model evaluated?
In order to evaluate the efficiency of predicting defects in a release of project of a model, all classes that belong to the given release were sorted according to the model output.
Q5. What was the criterion for a strong predictor for another project?
A project was considered as a strong predictor for another project, when all precision, recall, and accuracy were greater than 0.75.
Q6. What is the typical approach in studies connected with defect prediction models?
Typical approach in studies connected with defect prediction models is to build a model according to data from an old version of a project and then validate or use this model on a new version of the same project.
Q7. What is the test used to check the homogeneity of variance?
The homogeneity of variance is checked by Levene's test, while the assumption that the sample came from a normally distributed population is tested by the Shapiro-Wilk test [13].
Q8. What is the metric used to compute the relativeness between methods of a class?
The metric is computed using the summation of number of different types of method parameters in every method divided by a multiplication of number of different method parameter types in whole class and number of methods.
Q9. What is the metric for lack of cohesion in methods?
The lack of cohesion in methods is then calculated by subtracting from the number of method pairs that do not share a field access the number of method pairs that do.
Q10. How many classes were visited in order to find 80% of defects?
the number of classes that must be visited in order to find 80% of defects were calculated and used as the model efficiency in predicting defects in a given release of the project.
Q11. What was the level of automation in the testing process?
High level of automatization in the testing process (the data about testing process were not available for all releases) was applied in most cases, and in all of them SVN repositories were used as the source code version control system.
Q12. Why is it difficult to reproduce the study in anindustrial environment?
Reproducing the study in anindustrial environment is difficult because in order to construct the correlation vectors the information about defects (that one is going to predict) is necessary.
Q13. What is the metric used to determine whether a commit is a bugfix?
BugInfo analyses the logs from source code repository (SVN or CVS) and according to the log content decides whether a commit is a bugfix.
Q14. How many classes are tested in order to find 80% of defects?
According to Table 3, on average 47.18% of classes must be tested in order to find 80% of defects when the general model is used and 47.41% of classes when the 2nd cluster model is used.
Q15. What is the author's opinion on the defect prediction methodology?
According to the obtained results, the authors said: “Their prediction methodology is designed for large industrial systems with a succession of releases over years of development” but later it “was successfully adapted to a system without release”.