scispace - formally typeset
Open AccessProceedings ArticleDOI

Towards identifying software project clusters with regard to defect prediction

Reads0
Chats0
TLDR
The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used.
Abstract
Background: This paper describes an analysis that was conducted on newly collected repository with 92 versions of 38 proprietary, open-source and academic projects. A preliminary study perfomed before showed the need for a further in-depth analysis in order to identify project clusters.Aims: The goal of this research is to perform clustering on software projects in order to identify groups of software projects with similar characteristic from the defect prediction point of view. One defect prediction model should work well for all projects that belong to such group. The existence of those groups was investigated with statistical tests and by comparing the mean value of prediction efficiency.Method: Hierarchical and k-means clustering, as well as Kohonen's neural network was used to find groups of similar projects. The obtained clusters were investigated with the discriminant analysis. For each of the identified group a statistical analysis has been conducted in order to distinguish whether this group really exists. Two defect prediction models were created for each of the identified groups. The first one was based on the projects that belong to a given group, and the second one - on all the projects. Then, both models were applied to all versions of projects from the investigated group. If the predictions from the model based on projects that belong to the identified group are significantly better than the all-projects model (the mean values were compared and statistical tests were used), we conclude that the group really exists.Results: Six different clusters were identified and the existence of two of them was statistically proven: 1) cluster proprietary B -- T=19, p=0.035, r=0.40; 2) cluster proprietary/open - t(17)=3.18, p=0.05, r=0.59. The obtained effect sizes (r) represent large effects according to Cohen's benchmark, which is a substantial finding.Conclusions: The two identified clusters were described and compared with results obtained by other researchers. The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used. Furthermore, a method of clustering was suggested and applied.

read more

Content maybe subject to copyright    Report

Towards identifying software project clusters with regard
to defect prediction
Marian Jureczko
Institute of Computer Engineering, Control and Robotics
Wrocław University of Technology
Wybrzeże Wyspiańskiego 27
50-370, Wrocław - Poland
+48 71 320 27 45
marian.jureczko@pwr.wroc.pl
Lech Madeyski
Institute of Informatics
Wrocław University of Technology
Wybrzeże Wyspiańskiego 27
50-370, Wrocław - Poland
lech.madeyski@pwr.wroc.pl
http://madeyski.e-informatyka.pl/
ABSTRACT
Background: This paper describes an analysisthat was conducted
on newly collected repository with 92 versions of 38 proprietary,
open-source and academic projects. A preliminary study
perfomed before showed the need for a further in-depth analysis
in order to identify project clusters.
Aims: The goal of this research is to perform clustering on
software projects in order to identify groups of software projects
with similar characteristic from the defect prediction point of
view. One defect prediction model should work well for all
projects that belong to such group. The existence of those groups
was investigated with statistical tests and by comparing the mean
value of prediction efficiency.
Method: Hierarchical and k-means clustering, as well as
Kohonen’s neural network was used to find groups of similar
projects. The obtained clusters were investigated with the
discriminant analysis. For each of the identified group a statistical
analysis has been conducted in order to distinguish whether this
group really exists. Two defect prediction models were created for
each of the identified groups. The first one was based on the
projects that belong to a given group, and the second one - on all
the projects. Then, both models were applied to all versions of
projects from the investigated group. If the predictions from the
model based on projects that belong to the identified group are
significantly better than the all-projects model (the mean values
were compared and statistical tests were used), we conclude that
the group really exists.
Results: Six different clusters were identified and the existence of
two of them was statistically proven: 1) cluster proprietary B –
T=19, p=0.035, r=0.40; 2) cluster proprietary/open – t(17)=3.18,
p=0.05, r=0.59. The obtained effect sizes (r) represent large
effects according to Cohen’s benchmark, which is a substantial
finding.
Conclusions: The two identified clusters were described and
compared with results obtained by other researchers. The results
of this work makes next step towards defining formal methods of
reuse defect prediction models by identifying groups of projects
within which the same defect prediction model may be used.
Furthermore, a method of clustering was suggested and applied.
Categories and Subject Descriptors
D.2.8 [Software Engineering]: Metrics – complexity measures,
product metrics, software science.
General Terms
Measurement
Keywords
Defect Prediction, Design Metrics, Size Metrics, Clustering.
INTRODUCTION
Testing of software systems is an activity that consumes time and
resources. Applying the same testing effort to all modules of a
system is not the optimal approach, because the distribution of
defects among individual parts of a software system is not
uniform. Therefore, testers should be able to identify fault-prone
classes. With such knowledge they would be able to prioritize the
tests, and therefore, work more efficiently. Accorindg to Weyuker
et al. [24,25] typically 20% of modules contain upwards of 80%
of defects. Testers with good defect predicator may be able to
spare a lot of test effort by testing only 20% of system modules
and still finding up to 80% of the software defects. Defect
prediction studies usually use historical data of previous versions
of software to build the defect prediction models. Such approach
can be applied neither in the first release of a software system, nor
by companies that do not collect historical data. Therefore, it is
vital to identify methods of constructing models that do not
require historical data is vital.
Considerable research has been performed on the defect
prediction methods; see the surveys by Purao and Vaishnavi [19],
or by Wahyudin et al. and [23], but the methods of reusing of
defect prediction model have not been discovered yet. There are
only works where the same model has been used in similar
projects (Watanabe et al. [22], Bell, Ostrand and Weyuker
[2,18,24] or Nagappan et al. [16]), but without identifying the
borders of similarity. According to the authors' knowledge there
are only two studies where cross project validation of defect
prediction models were performed [21, 26]; both are described in
the next section. The goal of this research is to fulfill that gap by
identifying clusters of software projects. Defect prediction in all
projects that belong to one cluster should be possible to make by
using only one defect prediction model. A preliminary study was
already conducted [11], where existence of three clusters was
investigated: proprietary projects, open-source projects and

academic projects. Only the defect prediction model created for
the open-source cluster was statistically better. Therefore, only
one cluster was proved to exist whereasit is extremely unlikely
that the other clusters do not exist. Further studies could reveal
other clusters, and it is also possible that the identified cluster
may be successfully split into several smaller clusters.
The paper is organized as follows: in Section 2 related works are
described. Section 3 presents the suite of OO metrics that were
used, the investigated projects, definition of the study and
discusses threats to validity of the study. The obtained results are
shown in Section 4. Conclusions are given in Section 6 and the
prospects for future research in Section 7.
RELATED WORKS
Typical approach in studies connected with defect prediction
models is to build a model according to data from an old version
of a project and then validate or use this model on a new version
of the same project. Such approach was used [2,8,17,18,24 ,25] as
well as advocated [5,23] by many researchers. Some experiments
were also reported where the cross-project reusability of a defect
prediction models was investigated.
Koru and Liu [12] came to interesting conclusions: “Normally,
defect prediction models will change from one development
environment to another according to specific defect patterns.” But
in their opinion, it does not mean that building generalizable
defect prediction model is not possible. In fact, such models may
be extremely useful and may serve as a starting point in
development environments that have no historical data.
Nagappan et al. [16] extended the state of the art through
analyzing whether predictors obtained from one project history
are applicable to other projects. The authors investigated five
proprietary software projects. The performed analyze showed that
there is no single set of metrics that fits to all five projects, but the
defect prediction models may be accurate when obtained form
similar projects (the similarity were not precisely defined). The
authors evaluated this problem by building one predictor for each
project and applying it to the entities of each of the other four
projects. Then the correlations between the actual and predicted
rankings were compared. It turned out that the projects histories
cannot serve as predictors for other projects in most cases. The
study was extended in [26], where 622 cross-project predictions
were performed for 12 real world applications. A project was
considered as a strong predictor for another project, when all
precision, recall, and accuracy were greater than 0.75. Only 21
cross-project validations satisfied this criterion – success rate
3.4%. Subsequently, guidelines that enable assessing the chance
of the success of a cross-project prediction were given. The
guidelines were summarized in a decision tree. The authors
constructed separate trees for assessing prediction precision,
recall, and accuracy, but only the tree for precision was given in
the paper.
Watanabe et al. [22] tried to apply in a C++ project a defect
prediction model that has been constructed according to the data
from a Java project. The reusability study in the opposite
direction was conducted as well. Sakura Editor and JEdit were
used as the investigated projects. Metrics from only one release
were collected, so the authors stratified 10-fold cross validation
model in order to count two metrics of models accuracy: precision
and recall. In intra project prediction they obtained precision
0.828 and 0.733 and recall 0.897 and 0.702. In inter project
prediction they obtained precision 0.872 and 0.622 and recall
0.596 and 0.402. According to obtained results, authors concluded
that in the case of a similar domain and a similar size, it is
possible to reuse the prediction model between languages; despite
the fact the precision/recall is not very high. The authors admitted
that their results were based on only two projects, so the
generality is not clear and in order to increase the generalization
level they were going to evaluate the reusability with other
projects whose domain is text editor.
Relevant to this study are experiments conducted by Ostrand et al.
[18], where two large industrial systems with separately seventeen
and nine releases were investigated. A negative binomial
regression model was used. The predictions were based on the
source code of current release, and fault and modification history
from previous release. The study was extended in [24] by
analyzing the third project (it increased the number of used
programming languages to ten). Applying the defect prediction
model to the third project gave good results – 20% of the files that
would contain the largest number of faults contained, on average,
83% of the faults. Further findings were presented in [25], where
the number of the investigated projects was increased to four.
According to the obtained results, the authors said: “Our
prediction methodology is designed for large industrial systems
with a succession of releases over years of development” but later
it “was successfully adapted to a system without release”.
However, it must be mentioned that Weyuker et al. used another
approach as the one that is presented in this paper. They had no
fixed model structure, the model equation was adjusted according
to data from the history of the analyzed system. Only the model
building procedure was fixed.
A comprehensive study of cross company defect prediction was
conducted by Turhan et al. [21]. Ten different software projects
were investigated. Turhan et al. concluded that there is no single
set of static code features (metrics) that may serve as defect
predictor for all software projects. The defect prediction models
effectiveness was measured using probability of detection (pd)
and probability of false alarm (pf). Cross company defect
prediction dramatically increased the pd as well as the pf. The
authors were also able to decrease the pf by applying the nearest
neighbor filtering. The similarity measure was the Euclidean
distance between the static code features. The project features that
may influence the effectiveness of cross company predictions
were not identified.
Wahyudin et al. [23] suggested a framework for defect prediction.
In the context of their framework they discussed the possibility of
reusing historical data in defect prediction for other projects. They
concluded that: “A prediction model models the context of a
particular project. As a consequence, predictors obtained from one
project are usually not applicable to other projects”. When the
predictors are applicable or whether there exist such groups of
projects within which one predicator may be applied to all
projects was not discussed.
STUDY DESIGN
Metrics and Tools
There is a number of size and complexity metrics that may be
used in defect prediction models. All metrics that are calculated

by the Ckjm
1
tool were used in thisstudy. The reported in [8]
version of ckjm was used. This is the version that calculated 19
metrics that has been reported as good quality indicators. Those
metrics were selected according to some reported experiments
[3,17] and own researches [9,10]. The utilized metrics comes
from several metrics suites.
The metrics suite suggested by Chidamber and Kemerer [4]:
Weighted methods per class (WMC). The value of the WMC is
equal to the number of methods in the class (assuming unity
weights for all methods).
Depth of Inheritance Tree (DIT). The DIT metric provides for
each class a measure of the inheritance levels from the object
hierarchy top.
Number of Children (NOC). The NOC metric simply measures
the number of immediate descendants of the class.
Coupling between object classes (CBO). The CBO metric
represents the number of classes coupled to a given class
(efferent couplings and afferent couplings). These couplings can
occur through method calls, field accesses, inheritance, method
arguments, return types, and exceptions.
Response for a Class (RFC). The RFC metric measures the
number of different methods that can be executed when an object
of that class receives a message. Ideally, we would want to find,
for each method of the class, the methods that class will call, and
repeat this for each called method, calculating what is called the
transitive closure of the method call graph. This process can
however be both expensive and quite inaccurate. Ckjm calculates
a rough approximation to the response set by simply inspecting
method calls within the class method bodies. The value of RFC is
the sum of number of methods called within the class method
bodies and the number of class methods. This simplification was
also used in the original description of the metric.
Lack of cohesion in methods (LCOM). The LCOM metric
counts the sets of methods in a class that are not related through
the sharing of some of the class fields. The original definition of
this metric (which is the one used in Ckjm) considers all pairs of
class methods. In some of these pairs both methods access at least
one common field of the class, while in other pairs the two
methods do not share any common field accesses. The lack of
cohesion in methods is then calculated by subtracting from the
number of method pairs that do not share a field access the
number of method pairs that do.
One metric suggested by Henderson-Sellers [6]:
Lack of cohesion in methods (LCOM3).
m - number of methods in a class;
a - number of attributes in a class;
μ(A) - number of methods that access
the attribute A.
The metrics suite suggested by Bansiy and Davis [1]:
Number of Public Methods (NPM). The NPM metric simply
counts all the methods in a class that are declared as public. The
metric is known also as Class Interface Size (CIS)
1
http://gromit.iiar.pwr.wroc.pl/p_inf/ckjm
Data Access Metric (DAM). This metric is the ratio of the
number of private (protected) attributes to the total number of
attributes declared in the class.
Measure of Aggregation (MOA). The metric measures the
extent of the part-whole relationship, realized by using attributes.
The metric is a count of the number of class fields whose types
are user defined classes.
Measure of Functional Abstraction (MFA). This metric is the
ratio of the number of methods inherited by a class to the total
number of methods accessible by the member methods of the
class. The constructors and the java.lang.Object (as parent) are
ignored.
Cohesion Among Methods of Class (CAM). This metric
computes the relatedness among methods of a class based upon
the parameter list of the methods. The metric is computed using
the summation of number of different types of method
parameters in every method divided by a multiplication of
number of different method parameter types in whole class and
number of methods.
The quality oriented extension to Chidamber & Kemerer metrics
suite suggested by Tang et al. [20]:
Inheritance Coupling (IC). This metric provides the number of
parent classes to which a given class is coupled. A class is
coupled to its parent class if one of its inherited methods is
functionally dependent on the new or redefined methods in the
class. A class is coupled to its parent class if one of the following
conditions is satisfied:
- One of its inherited methods uses an attribute that is defined in
a new/redefined method.
- One of its inherited methods calls a redefined method.
- One of its inherited methods is called by a redefined method
and uses a parameter that is defined in the redefined method.
Coupling Between Methods (CBM). The metric measures the
total number of new/redefined methods to which all the inherited
methods are coupled. There is a coupling when at least one of the
conditions given in the IC metric is held.
Average Method Complexity (AMC). This metric measures the
average method size for each class. The size of a method is equal
to the number of Java binary codes in the method.
Two metrics suggested by Martin [15]:
Afferent couplings (Ca). The Ca metric represents the number
of classes that depend upon the measured class.
Efferent couplings (Ce). The Ca metric represents the number
of classes that the measured class is depended upon.
One McCabe's metric [14]:
McCabe's cyclomatic complexity (CC). CC is equal to the
number of different paths in a method (function) plus one. The
cyclomatic complexity is defined as: CC = E–N+P; where E - the
number of edges of the graph, N - the number of nodes of the
graph, P - the number of connected components. CC is the only
method size metric. The constructed models make the class size
predictions. Therefore, the metric had to be converted to a class
size metric. Two metrics has been derived:
m
mA
a
LCOM
a
j
j
=
=
1
)(
1
3
1
μ

- Max(CC) - the greatest value of CC among methods of the
investigated class.
- Avg(CC) - the arithmetic mean of the CC value in the
investigated class.
Those metrics were complemented with one more, very popular
metric:
Lines of Code (LOC). The LOC metric calculates the number
of lines of code in the Java binary code of the class under
investigation.
The information about defects occurrence was collected with a
tool called BugInfo. BugInfo analyses the logs from source code
repository (SVN or CVS) and according to the log content decides
whether a commit is a bugfix. A commit is interpreted as a bugfix
when it solves an issue reported in the bug tracking system. Each
of the projects had been investigated in order to identify bugfixes
commenting guidelines that were used in the source code
repository. The guidelines were formalized in regular expressions.
Buginfo compares the regular expressions with comments of the
commits. When a comment matches the regular expression,
BugInfo increments the defect count for all classes that have been
modified in the commit. The BugInfo tool has had no official
release yet, but we are going to implement some improvements,
especially in the user interface, and then make an official release.
Its current version is available at:
http://kenai.com/projects/buginfo
. There is no formal evaluation
regarding the efficiency of this tool in mapping defects yet, but
comprehensive functional tests were conducted and many of the
tests are available as JUnit tests in the source code package. All
collected data is available online at:
http://purl.org/MarianJureczko/MetricsRepo
.
Investigated projects
48 releases of 15 open source projects were investigated: Apache
Ant (1.3 – 1.7), Apache Camel (1.0 – 1.6), Ckjm (1.8), Apache
Forrest (0.6 – 0.8), Apache Ivy (1.1 – 2.0), JEdit (3.2.1 – 4.3),
Apache Log4j (1.0 – 1.2), Apache Lucene (2.0 – 2.2), PBeans (1.0
and 2.0), Apache POI (1.5 – 3.0), Apache Synapse (1.0 – 1.2),
Apache Tomcat (6.0), Apache Velocity (1.4 – 1.6.1), Apache
Xalan-Java (2.4.0 – 2.7.0), Apache Xerces (1.1.0 – 1.4.4). A more
comprehensive discussion of most of those projects was given in
[8].
27 releases of 6 proprietary software projects were investigated.
Five of them are custom build solutions that had been already
successfully installed in the customer environment. Those five
projects belong to the same domain: insurances. The 6
th
proprietary project is a standard tool that supports quality
assurances in software development. All six projects were
developed by the same company.
Moreover, 17 academic software projects were investigated. Each
of them had exactly one release. Those projects were
implemented by 8
th
or 9
th
semester computer science students.
The students worked in groups of 3 to 6 persons during one year.
A highly iterative software development process was used. A
UML documentation was prepared and high level of test code
coverage were obtained for each of those projects. JUnit and
FitNesse were used as test tools. Some of those projects had been
already investigated in [9,10].
All of the investigated projects were written in Java.
Analysis method employed
It had been assumed that character of a defect predictor strongly
depends on the correlation between metrics and number of defects
in a class. A correlation vector was calculated for each of the
investigated releases of projects. The correlation between each of
metric (the metrics are given in 3.1) and the number of defects
were calculated. The vectors were than extended by adding the
ratio of defects per class.
In order to uncover the project clusters, hierarchical clustering
procedure and then k-means clustering were used. The complete
linkage clustering indicated a two-group solution. Additionally
Kohonen's neural network was used. The results returned by the
Kohonen's neural network differ between separate runs of the
network. Therefore, the network was executed several times and
those releases of projects that were predominantly classified into
the same neuron (cluster) were later investigated in order to
distinguish whether it is a cluster from the defect prediction point
of view. The obtained results were investigated with the
discriminant analysis. Several different configurations of the
Kohonen’s network with different number of the output neurons
were used, but no more than 4 clusters were obtained, even when
the number of output neurons was increased up to 16.
For each of the identified cluster a defect prediction model was
created. In order to create the model, all metrics were used and
the stepwise linear regression was applied. Due to the stepwise
regression, a typical model used five to ten metrics (not all of
them). Subsequently, the models were evaluated by being applied
to all releases of projects that belonged to the investigated cluster.
In order to evaluate the efficiency of predicting defects in a
release of project of a model, all classes that belong to the given
release were sorted according to the model output. Descending
predicted number of defects was used as sorting order. Next, the
number of classes that must be visited in order to find 80% of
defects were calculated and used as the model efficiency in
predicting defects in a given release of the project. A general
defect prediction model was build too. The general model used
data from all the releases of all the projects as training set. In
order to distinguish whether a cluster exists from the defect
prediction point of view the efficiency of a model created for the
cluster was compared with the efficiency of the general model.
Those two models were applied only to those releases of software
projects that belonged to the investigated cluster. When the
efficiency of the model created for the cluster is significantly
better than the efficiency of the general model one may assume
that the cluster exists. In order to investigate whether the
difference was significant, statistical test were used.
To render that in a more formally way, it is necessary to assume
that R is a set of all releases of all projects and r is a single release
of a project. C is a set of all r that were selected in a cluster. C is a
subset of R (C
R). There are two defect prediction models M
R
and M
C
. M
R
is the general model that was trained with all r
R.
M
C
is a cluster model that was trained with all rC. E(M,r) is the
evaluation of efficiency of model M in predicting defects in
release r. Let c
1
, c
2
, …, c
n
be the classes from release r in
descending order of predicted defects according to the model M
X
,
and d
1
, d
2
, …, d
n
be the number of defects in each class. D
i
is
sum(d
1
, …, d
i
), i.e., the total defects in the first i classes. Let k be

the smallest index such that D
k
> 0.8*D
n
, then E(M
X
,r)= D
k
.
E(M
R
,r) and E(M
C
,r) were calculated for all rC. In order to
decide whether the cluster exists from the defect prediction point
of view a hypothesis must be defined:
H
0
– There is no difference in the efficiency of defect prediction
between the general model and the cluster model:
E(M
R
,r)=E(M
C
,r): rC.
H
1
– There is a difference in the efficiency of defect prediction
between the general model and the cluster model:
E(M
R
,r)>E(M
C
,r): rC.
The hypotheses are evaluated by the parametric t-test for
dependent samples. Following general assumptions should be
checked in order to use a parametric test: level of measurement
(the variables must be measured at the interval or ratio level
scale), independence of observations, homogeneity of variance
and the normal distribution of the sample. The homogeneity of
variance is checked by Levene's test, while the assumption that
the sample came from a normally distributed population is tested
by the Shapiro-Wilk test [13]. When some of the assumptions are
violated, the Wilcoxon matched pairs test is used.
There is an overlap between training and testing sets. In order to
avoid this overlap, a separate model must be created for each of
the releases from the investigated model: M
C-r
. In such case we
would get n different models (where n is the number of cluster
members) and each of the models would be using different set of
the releases as the training set. As a result, the definition of the
cluster would be fuzzy. On the other hand, excluding one release
from the training set affects the model very slightly. Therefore,
we decided to use the overlaping approach.
Threats to validity
A number of limitations that may compromise to some extent the
quality of the results of this study are listed below.
It is possible that there are mistakes in the defect identification.
The comments in the source code version control system are not
always well written and, therefore, it was sometimes very hard to
decide whether a change is connected with a defect or not. In
some cases the comment could be confronted with a bug tracking
system, but unfortunately it was not possible in all projects.
The defects are assigned to classes according to the bugfix date. It
could be probably better to assign the defect to the version, where
the defect has been found, but unfortunately, the source code
version control system does not contain such information.
We were not able to track operations like changing class name or
moving class between packages. Therefore, after such a change,
the class is interpreted as a new class. Similar difficulties were
created by anonymous classes. Hence, the anonymous classes
were ignored in the analysis.
The defects are identified according to the comments in the
source code version control system. The guidelines of
commenting bugfixes may vary among different projects.
Therefore, it is possible that interpretation of the term defect is
not unique among the investigated projects.
RESULTS
The results of two different approaches to clustering, using
hierarchical and k-means clustering as well as Kohonen's neural
network, are presented below.
Study 1 – two clusters
In the first study all the releases of all the projects were divided
into two clusters, since the complete linkage hierarchical
clustering has suggested the possibility of a “natural” partition
into two sets of projects. Hence, the k-means two group solution
is analyzed and the results are presented in Tables 1-3.
Table 1. Descriptive statistics – cluster 1
st
of 2
Num. of cases Mean Std deviation
E(M
R
,r): r
C
61 49.73 19.64
E(M
C
,r): r
C
61 49.67 18.37
Table 2. Hypothesis tests – cluster 1
st
of 2
E(M
R
,r): rC E(M
C
,r): r
C
Shapiro
- Wilk
test
W
0.987 0.991
p
0.782 0.931
Levene's
test
df
118
F(1,df)
0.434
P
0.511
T-test
T
0.057
df
60
P
0.954
According to Tables 1-2, the cluster 1
st
of 2 does not exist from
the defect prediction point of view.
Table 3. Descriptive statistics – cluster 2
nd
of 2
Num. of cases Mean Std deviation
E(M
R
,r): r
C
31 47.18 17.80
E(M
C
,r): r
C
31 47.41 17.29
According to Table 3, on average 47.18% of classes must be
tested in order to find 80% of defects when the general model is
used and 47.41% of classes when the 2
nd
cluster model is used.
Therefore, the mean efficiency of the 2
nd
cluster model was worse
than the mean efficiency of the general model. In consequence,
there is no point in testing the hypothesis.
The conducted analysis showed that none of the two investigated
clusters exists in the defect prediction point of view.
Study 2 – Kohonen's neural network
In the second approach Kohonen's neural network was used. Four
clusters were identified according to the network's output. There
are releases that were classified into none of those clusters.

Citations
More filters
Journal ArticleDOI

An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

TL;DR: It is found that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques, and out-of-sample bootstrap validation yields the best balance between the bias and variance.
Proceedings ArticleDOI

Automated parameter optimization of classification techniques for defect prediction models

TL;DR: This paper concludes that parameter settings can indeed have a large impact on the performance of defect prediction models, suggesting that researchers should experiment with the parameters of the classification techniques.
Journal ArticleDOI

An investigation on the feasibility of cross-project defect prediction

TL;DR: This paper investigates defect predictions in the cross-project context focusing on the selection of training data and proposes an approach to automatically select suitable training data for projects without historical data.
Journal ArticleDOI

The Impact of Automated Parameter Optimization on Defect Prediction Models

TL;DR: In this article, the authors study the impact of parameter optimization on defect prediction models and find that automated parameter optimization can substantially shift the importance ranking of variables, with as few as 28 percent of the top-ranked variables in optimized classifiers also being topranked in non-optimized classifiers.
Proceedings ArticleDOI

Software Defect Prediction via Convolutional Neural Network

TL;DR: This paper proposes a framework called Defect Prediction via Convolutional Neural Network (DP-CNN), which leverages deep learning for effective feature generation and evaluates the method on seven open source projects in terms of F-measure in defect prediction.
References
More filters
Book

A metrics suite for object oriented design

TL;DR: This research addresses the needs for software measures in object-orientation design through the development and implementation of a new suite of metrics for OO design, and suggests ways in which managers may use these metrics for process improvement.
Book

A complexity measure

TL;DR: In this paper, a graph-theoretic complexity measure for managing and controlling program complexity is presented. But the complexity is independent of physical size, and complexity depends only on the decision structure of a program.
Journal ArticleDOI

A Complexity Measure

TL;DR: Several properties of the graph-theoretic complexity are proved which show, for example, that complexity is independent of physical size and complexity depends only on the decision structure of a program.
Journal ArticleDOI

A critique of software defect prediction models

TL;DR: H holistic models for software defect prediction, using Bayesian belief networks, are recommended as alternative approaches to the single-issue models used at present and research into a theory of "software decomposition" is argued for.
Journal ArticleDOI

A hierarchical model for object-oriented design quality assessment

TL;DR: An improved hierarchical model that relates design properties such as encapsulation, modularity, coupling, and cohesion to high-level quality attributes such as reusability, flexibility, and complexity using empirical and anecdotal information is described.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What have the authors contributed in "Towards identifying software project clusters with regard to defect prediction" ?

This paper describes an analysisthat was conducted on newly collected repository with 92 versions of 38 proprietary, open-source and academic projects. A preliminary study perfomed before showed the need for a further in-depth analysis in order to identify project clusters. The goal of this research is to perform clustering on software projects in order to identify groups of software projects with similar characteristic from the defect prediction point of view. The existence of those groups was investigated with statistical tests and by comparing the mean value of prediction efficiency. The first one was based on the projects that belong to a given group, and the second one on all the projects. If the predictions from the model based on projects that belong to the identified group are significantly better than the all-projects model ( the mean values were compared and statistical tests were used ), the authors conclude that the group really exists. The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used. Furthermore, a method of clustering was suggested and applied. 

Further research is necessary to identify more clusters. The clusters that were identified are very wide and therefore it is possible that those clusters may be successfully divided into smaller ones. There may be conducted a cross validation for the study. 

Following general assumptions should be checked in order to use a parametric test: level of measurement (the variables must be measured at the interval or ratio level scale), independence of observations, homogeneity of variance and the normal distribution of the sample. 

In order to evaluate the efficiency of predicting defects in a release of project of a model, all classes that belong to the given release were sorted according to the model output. 

A project was considered as a strong predictor for another project, when all precision, recall, and accuracy were greater than 0.75. 

Typical approach in studies connected with defect prediction models is to build a model according to data from an old version of a project and then validate or use this model on a new version of the same project. 

The homogeneity of variance is checked by Levene's test, while the assumption that the sample came from a normally distributed population is tested by the Shapiro-Wilk test [13]. 

The metric is computed using the summation of number of different types of method parameters in every method divided by a multiplication of number of different method parameter types in whole class and number of methods. 

The lack of cohesion in methods is then calculated by subtracting from the number of method pairs that do not share a field access the number of method pairs that do. 

the number of classes that must be visited in order to find 80% of defects were calculated and used as the model efficiency in predicting defects in a given release of the project. 

High level of automatization in the testing process (the data about testing process were not available for all releases) was applied in most cases, and in all of them SVN repositories were used as the source code version control system. 

Reproducing the study in anindustrial environment is difficult because in order to construct the correlation vectors the information about defects (that one is going to predict) is necessary. 

BugInfo analyses the logs from source code repository (SVN or CVS) and according to the log content decides whether a commit is a bugfix. 

According to Table 3, on average 47.18% of classes must be tested in order to find 80% of defects when the general model is used and 47.41% of classes when the 2nd cluster model is used. 

According to the obtained results, the authors said: “Their prediction methodology is designed for large industrial systems with a succession of releases over years of development” but later it “was successfully adapted to a system without release”.