scispace - formally typeset
Open AccessProceedings ArticleDOI

Heterogeneous defect prediction

TLDR
This paper identifies categories of data sets were as few as 50 instances are enough to build a defect prediction model, and shows that empirically and theoretically, “large enough” may be very small indeed.
Abstract
Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect prediction (WPDP). Researchers also proposed cross-project defect prediction (CPDP) to predict defects for new projects lacking in defect data by using prediction models built by other projects. In recent studies, CPDP is proved to be feasible. However, CPDP requires projects that have the same metric set, meaning the metric sets should be identical between projects. As a result, current techniques for CPDP are difficult to apply across projects with heterogeneous metric sets. To address the limitation, we propose heterogeneous defect prediction (HDP) to predict defects across projects with heterogeneous metric sets. Our HDP approach conducts metric selection and metric matching to build a prediction model between projects with heterogeneous metric sets. Our empirical study on 28 subjects shows that about 68% of predictions using our approach outperform or are comparable to WPDP with statistical significance.

read more

Content maybe subject to copyright    Report

Heterogeneous Defect Prediction
Jaechang Nam and Sunghun Kim
Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
Hong Kong, China
{jcnam,hunkim}@cse.ust.hk
ABSTRACT
Software defect prediction is one of the most active research
areas in software engineering. We can build a prediction
model with defect data collected from a software project
and predict defects in the same project, i.e. within-project
defect prediction (WPDP). Researchers also proposed cross-
project defect prediction (CPDP) to predict defects for new
projects lacking in defect data by using prediction models
built by other projects. In recent studies, CPDP is proved
to be feasible. However, CPDP requires projects that have
the same metric set, meaning the metric sets should be iden-
tical between projects. As a result, current techniques for
CPDP are difficult to apply across projects with heteroge-
neous metric sets.
To address the limitation, we propose heterogeneous de-
fect prediction (HDP) to predict defects across projects with
heterogeneous metric sets. Our HDP approach conducts
metric selection and metric matching to build a prediction
model between projects with heterogeneous metric sets. Our
empirical study on 28 subjects shows that about 68% of pre-
dictions using our approach outperform or are comparable
to WPDP with statistical significance.
Categories and Subject Descriptors
D.2.9 [Software Engineering]: Management—software qual-
ity assurance
General Terms
Algorithm, Experimentation
Keywords
Defect prediction, quality assurance, heterogeneous metrics
1. INTRODUCTION
Software defect prediction is one of the most active re-
search areas in software engineering [8, 9, 24, 25, 26, 36,
37, 43, 47, 58, 59]. If software quality assurance teams can
predict defects before releasing a software product, they can
effectively allocate limited resources for quality control [36,
38, 43, 58]. For example, Ostrand et al. applied defect pre-
diction in two large software systems of AT&T for effective
and efficient testing activities [38].
Most defect prediction models are based on machine learn-
ing, therefore it is a must to collect defect datasets to train a
prediction model [8, 36]. The defect datasets consist of var-
ious software metrics and labels. Commonly used software
metrics for defect prediction are complexity metrics (such as
lines of code, Halstead metrics, McCabe’s cyclometic com-
plexity, and CK metrics) and process metrics [2, 16, 32, 42].
Labels indicate whether the source code is buggy or clean
for binary classification [24, 37].
Most proposed defect prediction models have been evalu-
ated on within-project defect prediction (WPDP) settings [8,
24, 36]. In Figure 1a, each instance representing a source
code file or function consists of software metric values and is
labeled as buggy or clean. In the WPDP setting, a predic-
tion model is trained using the labeled instances in Project
A and predict unlabeled (‘?’) instances in the same project
as buggy or clean.
However, it is difficult to build a prediction model for new
software projects or projects with little historical informa-
tion [59] since they do not have enough training instances.
Various process metrics and label information can be ex-
tracted from the historical data of software repositories such
as version control and issue tracking systems [42]. Thus, it
is difficult to collect process metrics and instance labels in
new projects or projects that have little historical data [9,
37, 59]. For example, without instances being labeled us-
ing past defect data it is not possible to build a prediction
model.
To address this issue, researchers have proposed cross-
project defect prediction (CPDP) [19, 29, 37, 43, 51, 59].
CPDP approaches predict defects even for new projects lack-
ing in historical data by reusing prediction models built by
other project datasets. As shown in Figure 1b, a prediction
model is trained by labeled instances in Project A (source)
and predicts defects in Project B (target).
However, most CPDP approaches have a serious limita-
tion: CPDP is only feasible for projects which have exactly
the same metric set as shown in Figure 1b. Finding other
projects with exactly the same metric set can be challenging.
Publicly available defect datasets that are widely used in de-
fect prediction literature usually have heterogeneous metric
sets [8, 35, 37]. For example, many NASA datasets in the

Test
Training
?"
?"
Model
Project A
: Metric value
: Buggy-labeled instance
: Clean-labeled instance
?"
: Unlabeled instance
(a) Within-Project Defect Prediction (WPDP)
?"
?"
?"
?"
?"
Training
Test
Model
Project A
(source)
Project B
(target)
Same metric set
(b) Cross-Project Defect Prediction (CPDP)
?"
Training
Test
Model
Project A
(source)
Project C
(target)
?"
?"
?"
?"
?"
?"
?
Heterogeneous!metric sets
(c) Heterogeneous Defect Prediction (HDP)
Figure 1: Various Defect Prediction Scenarios
PROMISE repository have 37 metrics but AEEEM datasets
used by D’Ambroas et al. have 61 metrics [8, 35]. The
only common metric between NASA and AEEEM datasets
is lines of code (LOC). CPDP between NASA and AEEEM
datasets with all metric sets is not feasible since they have
completely different metrics [51].
Some CPDP studies use only common metrics when source
and target datasets have heterogeneous metric sets [29, 51].
For example, Turhan et al. use the only 17 common metrics
between the NASA and SOFTLAB datasets that have het-
erogeneous metric sets [51]. However, finding other projects
with multiple common metrics can be challenging. As men-
tioned, there is only one common metric between NASA and
AEEEM. Also, only using common metrics may degrade the
performance of CPDP models. That is because some in-
formative metrics necessary for building a good prediction
model may not be in the common metrics across datasets.
For example, in the study of Turhan et al., the performance
of CPDP (0.35) by their approach did not outperform that
of WPDP (0.39) in terms of the average f-measure [51].
In this paper, we propose the heterogeneous defect predic-
tion (HDP) approach to predict defects across projects even
with heterogeneous metric sets. If the proposed approach is
feasible as in Figure 1c, we could reuse any existing defect
datasets to build a prediction model. For example, many
PROMISE defect datasets even if they have heterogeneous
metric sets [35] could be used as training datasets to predict
defects in any project.
The key idea of our HDP approach is matching metrics
that have similar distributions between source and target
datasets. In addition, we also used metric selection to re-
move less informative metrics of a source dataset for a pre-
diction model before metric matching.
Our empirical study shows that HDP models are feasible
and their prediction performance is promising. About 68%
of HDP predictions outperform or are comparable to WPDP
predictions with statistical significance.
Our contributions are as follows:
Propose the heterogeneous defect prediction models.
Conduct an extensive and large-scale empirical study to
evaluate the heterogeneous defect prediction models.
2. BACKGROUND AND RELATED WORK
The CPDP approaches have been studied by many re-
searchers of late [29, 37, 43, 51, 59]. Since the performance
of CPDP is usually very poor [59], researchers have proposed
various techniques to improve CPDP [29, 37, 51, 54].
Watanabe et al. proposed the metric compensation ap-
proach for CPDP [54]. The metric compensation transforms
a target dataset similar to a source dataset by using the av-
erage metric values [54]. To evaluate the performance of the
metric compensation, Watanabe et al. collected two defect
datasets with the same metric set (8 object-oriented metrics)
from two software projects and then conducted CPDP [54].
Rahman et al. evaluated the CPDP performance in terms
of cost-effectiveness and confirmed that the prediction per-
formance of CPDP is comparable to WPDP [43]. For the
empirical study, Rahman et al. collected 9 datasets with the
same process metric set [43].
Fukushima et al. conducted an empirical study of just-in-
time defect prediction in the CPDP setting [9]. They used
16 datasets with the same metric set [9]. The 11 datasets
were provided by Kamei et al. but 5 projects were newly
collected with the same metric set of the 11 datasets [9, 20].
However, collecting datasets with the same metric set
might limit CPDP. For example, if existing defect datasets
contain object-oriented metrics such as CK metrics [2], col-
lecting the same object-oriented metrics is impossible for
projects that are written in non-object-oriented languages.
Turhan et al. proposed the nearest-neighbour (NN) filter
to improve the performance of CPDP [51]. The basic idea of
the NN filter is that prediction models are built by source in-
stances that are nearest-neighbours of target instances [51].
To conduct CPDP, Turhan et al. used 10 NASA and SOFT-
LAB datasets in the PROMISE repository [35, 51].
Ma et al. proposed Transfer Naive Bayes (TNB) [29].
The TNB builds a prediction model by weighting source
instances similar to target instances [29]. Using the same
datasets used by Turhan et al., Ma et al. evaluated the
TNB models for CPDP [29, 51].
Since the datasets used in the empirical studies of Turhan
et al. and Ma et al. have heterogeneous metric sets, they
conducted CPDP using the common metrics [29, 51]. There
is another CPDP study with the top-K common metric sub-
set [17]. However, as explained in Section 1, CPDP using
common metrics is worse than WPDP [17, 51].
Nam et al. adapted a state-of-the-art transfer learning
technique called Transfer Component Analysis (TCA) and
proposed TCA+ [37]. They used 8 datasets in two groups,
ReLink and AEEEM, with 26 and 61 metrics respectively [37].

However, Nam et al. could not conduct CPDP between
ReLink and AEEEM because they have heterogeneous met-
ric sets. Since the project pool with the same metric set is
very limited, conducting CPDP using a project group with
the same metric set can be limited as well. For example,
at most 18% of defect datasets in the PROMISE repository
have the same metric set [35]. In other words, we cannot di-
rectly conduct CPDP for the 18% of the defect datasets by
using the remaining (82%) datasets in the PROMISE repos-
itory [35]. CPDP studies conducted by Canfora et al. and
Panichella et al. use 10 Java projects only with the same
metric set from the PROMISE repository [4, 35, 39]
Zhang et al. proposed the universal model for CPDP [57].
The universal model is built using 1398 projects from Source-
Forge and Google code and leads to comparable prediction
results to WPDP in their experimental setting [57].
However, the universal defect prediction model may be
difficult to apply for the projects with heterogeneous met-
ric sets since the universal model uses 26 metrics including
code metrics, object-oriented metrics, and process metrics.
In other words, the model can only be applicable for target
datasets with the same 26 metrics. In the case where the
target project has not been developed in object-oriented lan-
guages, a universal model built using object-oriented metrics
cannot be used for the target dataset.
He et al. addressed the limitations due to heterogeneous
metric sets in CPDP studies listed above [18]. Their ap-
proach, CPDP-IFS, used distribution characteristic vectors
of an instance as metrics. The prediction performance of
their best approach is comparable to or helpful in improv-
ing regular CPDP models [18].
However, the approach by He et al. is not compared with
WPDP [18]. Although their best approach is helpful to im-
prove regular CPDP models, the evaluation might be weak
since the prediction performance of a regular CPDP is usu-
ally very poor [59]. In addition, He et al. conducted exper-
iments on only 11 projects in 3 dataset groups [18].
We propose HDP to address the above limitations caused
by projects with heterogeneous metric sets. Contrary to the
study by He et al. [18], we compare HDP to WPDP, and
HDP achieved better or comparable prediction performance
to WPDP in about 68% of predictions. In addition, we
conducted extensive experiments on 28 projects in 5 dataset
groups. In Section 3, we explain our approach in detail.
3. APPROACH
Figure 2 shows the overview of HDP based on metric se-
lection and metric matching. In the figure, we have two
datasets, Source and Target, with heterogeneous metric sets.
Each row and column of a dataset represents an instance
and a metric, respectively, and the last column represents
instance labels. As shown in the figure, the metric sets in
the source and target datasets are not identical (X
1
to X
4
and Y
1
to Y
7
respectively).
When given source and target datasets with heterogeneous
metric sets, for metric selection we first apply a feature selec-
tion technique to the source. Feature selection is a common
approach used in machine learning for selecting a subset of
features by removing redundant and irrelevant features [13].
We apply widely used feature selection techniques for metric
selection of a source dataset as in Section 3.1 [10, 47].
After that, metrics based on their similarity such as dis-
tribution or correlation between the source and target met-
X
1#
X
2#
X
3#
X
4#
Label#
1" 1" 3"
10"
Buggy"
8" 0" 1" 0"
Clean"
" " " " "
9" 0" 1" 1"
Clean"
Metric
Matching
Source: Project A
Target: Project B
Prediction
Model
Build
(training)
Predict
(test)
Metric
Selection
Y
1#
Y
3#
Y
5#
Y
7#
Label#
3" 1" 1" 0" 2" 1" 9" ?"
1" 1" 9" 0" 2" 3" 8" ?"
" " " " " " " "
0" 1" 1" 1" 2" 1" 1" ?"
1" 3"
10"
Buggy"
8" 1" 0"
Clean"
" " " "
9" 1" 1"
Clean"
1" 3"
10"
Buggy"
8" 1" 0"
Clean"
" " " "
9" 1" 1"
Clean"
9" 1" 1" ?"
8" 3" 9" ?"
" " " "
1" 1" 1" ?"
Figure 2: Heterogeneous defect prediction
rics are matched up. In Figure 2, three target metrics are
matched with the same number of source metrics.
After these processes, we finally arrive at a matched source
and target metric set. With the final source dataset, HDP
builds a model and predicts labels of target instances.
In the following subsections, we explain the metric selec-
tion and matching in detail.
3.1 Metric Selection in Source Datasets
For metric selection, we used various feature selection ap-
proaches widely used in defect prediction such as gain ra-
tio, chi-square, relief-F, and significance attribute evalua-
tion [10, 47]. According to benchmark studies about various
feature selection approaches, a single best feature selection
approach for all prediction models does not exist [5, 15, 28].
For this reason, we conduct experiments under different fea-
ture selection approaches. When applying feature selection
approaches, we select top 15% of metrics as suggested by
Gao et al. [10]. In addition, we compare the prediction re-
sults with or without metric selection in the experiments.
3.2 Matching Source and Target Metrics
To match source and target metrics, we measure the sim-
ilarity of each source and target metric pair by using several
existing methods such as percentiles, Kolmogorov-Smirnov
Test, and Spearman’s correlation coefficient [30, 49]. We de-
fine the following three analyzers for metric matching:
Percentile based matching (PAnalyzer)
Kolmogorov-Smirnov Test based matching (KSAnalyzer)
Spearman’s correlation based matching (SCoAnalyzer)
The key idea of these analyzers is computing matching
scores for all pairs between the source and target metrics.
Figure 3 shows a sample matching. There are two source
metrics (X
1
and X
2
) and two target metrics (Y
1
and Y
2
).
Thus, there are four possible matching pairs, (X
1
,Y
1
), (X
1
,Y
2
),
(X
2
,Y
1
), and (X
2
,Y
2
). The numbers in rectangles between

Source Metrics Target Metrics
X
1
X
2
Y
1
Y
2
0.8
0.4
0.5
0.3
Figure 3: An example of metric matching between
source and target datasets.
matched source and target metrics in Figure 3 represent
matching scores computed by an analyzer. For example,
the matching score between the metrics, X
1
and Y
1
, is 0.8.
From all pairs between the source and target metrics, we
remove poorly matched metrics whose matching score is not
greater than a specific cutoff threshold. For example, if the
matching score cutoff threshold is 0.3, we include only the
matched metrics whose matching score is greater than 0.3.
In Figure 3, the edge (X
1
,Y
2
) in matched metrics will be
excluded when the cutoff threshold is 0.3. Thus, all the
candidate matching pairs we can consider include the edges
(X
1
,Y
1
), (X
2
,Y
2
), and (X
2
,Y
1
) in this example. In Section 4,
we design our empirical study under different matching score
cutoff thresholds to investigate their impact on prediction.
We may not have any matched metrics based on the cutoff
threshold. In this case, we cannot conduct defect prediction.
In Figure 3, if the cutoff threshold is 0.9, none of the matched
metrics are considered for HDP so we cannot build a pre-
diction model for the target dataset. For this reason, we
investigate target prediction coverage (i.e. what percentage
of target datasets could be predicted?) in our experiments.
After applying the cutoff threshold, we used the maximum
weighted bipartite matching [31] technique to select a group
of matched metrics, whose sum of matching scores is highest,
without duplicated metrics. In Figure 3, after applying the
cutoff threshold of 0.30, we can form two groups of matched
metrics without duplicated metrics. The first group con-
sists of the edges, (X
1
,Y
1
) and (X
2
,Y
2
), and another group
consists of the edge (X
2
,Y
1
). In each group, there are no
duplicated metrics. The sum of matching scores in the first
group is 1.3 (=0.8+0.5) and that of the second group is 0.4.
The first group has a greater sum (1.3) of matching scores
than the second one (0.4). Thus, we select the first match-
ing group as the set of matched metrics for the given source
and target metrics with the cutoff threshold of 0.30 in this
example.
Each analyzer for the metric matching scores is described
below.
3.2.1 PAnalyzer
PAnalyzer simply compares 9 percentiles (10th, 20th,. . . ,
90th) of ordered values between source and target metrics.
First, we compute the difference of n-th percentiles in
source and target metric values by the following equation:
P
ij
(n) =
sp
ij
(n)
bp
ij
(n)
(1)
, where P
ij
(n) is the comparison function for n-th percentiles
of i-th source and j-th target metrics, and sp
ij
(n) and bp
ij
(n)
are smaller and bigger percentile values respectively at n-th
percentiles of i-th source and j-th target metrics. For exam-
ple, if the 10th percentile of the source metric values is 20
and that of target metric values is 15, the difference is 0.75
(P
ij
(10) = 15/20 = 0.75).
Using this percentile comparison function, a matching score
between source and target metrics is calculated by the fol-
lowing equation:
M
ij
=
9
P
k=1
P
ij
(10 × k)
9
(2)
, where M
ij
is a matching score between i-th source and j-th
target metrics. The best matching score of this equation is
1.0 when the values of the source and target metrics of all 9
percentiles are the same.
3.2.2 KSAnalyzer
KSAnalyzer uses a p-value from the Kolmogorov-Smirnov
Test (KS-test) as a matching score between source and tar-
get metrics. The KS-test is a non-parametric two sample
test that can be applicable when we cannot be sure about
the normality of two samples and/or the same variance [27,
30]. Since metrics in some defect datasets used in our em-
pirical study have exponential distributions [36] and metrics
in other datasets have unknown distributions and variances,
the KS-test is a suitable statistical test to compute p-values
for these datasets. In statistical testing, a p-value shows the
probability of whether two samples are significantly different
or not. We used the KolmogorovSmirnovTest implemented
in the Apache commons math library.
The matching score is:
M
ij
= p
ij
(3)
, where p
ij
is a p-value from the KS-test of i-th source and
j-th target metrics. A p-value tends to be zero when two
metrics are significantly different.
3.2.3 SCoAnalyzer
In SCoAnalyzer, we used the Spearman’s rank correla-
tion coefficient as a matching score for source and target
metrics [49]. Spearman’s rank correlation measures how
two samples are correlated [49]. To compute the coefficient,
we used the SpearmansCorrelation in the Apache commons
math library. Since the size of metric vectors should be the
same to compute the coefficient, we randomly select metric
values from a metric vector that is of a greater size than an-
other metric vector. For example, if the sizes of the source
and target metric vectors are 110 and 100 respectively, we
randomly select 100 metric values from the source metric to
agree to the size between the source and target metrics. All
metric values are sorted before computing the coefficient.
The matching score is as follows:
M
ij
= c
ij
(4)
, where c
ij
is a Spearman’s rank correlation coefficient be-
tween i-th source and j-th target metrics.
3.3 Building Prediction Models
After applying metric selection and matching, we can fi-
nally build a prediction model using a source dataset with
selected and matched metrics. Then, as a regular defect
prediction model, we can predict defects on a target dataset
with metrics matched to selected source metrics.

Table 1: The 28 defect datasets from five groups.
Group Dataset
# of instances # of
metrics
Prediction
GranularityAll Buggy(%)
AEEEM
[8, 37]
EQ 325 129(39.69%)
61 Class
JDT 997 206(20.66%)
LC 399 64(9.26%)
ML 1862 245(13.16%)
PDE 1492 209(14.01%)
ReLink
[56]
Apache 194 98(50.52%)
26 FileSafe 56 22(39.29%)
ZXing 399 118(29.57%)
MORPH
[40]
ant-1.3 125 20(16.00%)
20 Class
arc 234 27(11.54%)
camel-1.0 339 13(3.83%)
poi-1.5 237 141(59.49%)
redaktor 176 27(15.34%)
skarbonka 45 9(20.00%)
tomcat 858 77(8.97%)
velocity-1.4 196 147(75.00%)
xalan-2.4 723 110(15.21%)
xerces-1.2 440 71(16.14%)
NASA
[35, 45]
cm1 327 42(12.84%)
37 Function
mw1 253 27(10.67%)
pc1 705 61(8.65%)
pc3 1077 134(12.44%)
pc4 1458 178(12.21%)
SOFTLAB
[51]
ar1 121 9(7.44%)
29
Function
ar3 63 8(12.70%)
ar4 107 20(18.69%)
ar5 36 8(22.22%)
ar6 101 15(14.85%)
4. EXPERIMENTAL SETUP
4.1 Research Questions
To systematically evaluate heterogeneous defect predic-
tion (HDP) models, we set three research questions.
RQ1: Is heterogeneous defect prediction comparable to
WPDP (Baseline1)?
RQ2: Is heterogeneous defect prediction comparable to
CPDP using common metrics (CPDP-CM, Baseline2)?
RQ3: Is heterogeneous defect prediction comparable to
CPDP-IFS (Baseline3)?
RQ1, RQ2, and RQ3 lead us to investigate whether our HDP
is comparable to WPDP (Baseline1), CPDP-CM (Baseline2),
and CDDP-IFS (Baseline3) [18].
4.2 Benchmark Datasets
We collected publicly available datasets from previous stud-
ies [8, 37, 40, 51, 56]. Table 1 lists all dataset groups used in
our experiments. Each dataset group has a heterogeneous
metric set as shown in the table. Prediction Granularity in
the last column of the table means the prediction granularity
of instances. Since we focus on the distribution or correla-
tion of metric values when matching metrics, it is beneficial
to be able to apply the HDP approach on datasets even in
different granularity levels.
We used five groups with 28 defect datasets: AEEEM,
ReLink, MORPH, NASA, and SOFTLAB.
AEEEM was used to benchmark different defect predic-
tion models [8] and to evaluate CPDP techniques [18, 37].
Each AEEEM dataset consists of 61 metrics including object-
oriented (OO) metrics, previous-defect metrics, entropy met-
rics of change and code, and churn-of-source-code metrics [8].
Datasets in ReLink were used by Wu et al. [56] to improve
the defect prediction performance by increasing the quality
of the defect data and have 26 code complexity metrics ex-
tracted by the Understand tool [52].
The MORPH group contains defect datasets of several
open source projects used in the study about the dataset
privacy issue for defect prediction [40]. The 20 metrics used
in MORPH are McCabe’s cyclomatic metrics, CK metrics,
and other OO metrics [40].
NASA and SOFTLAB contain proprietary datasets from
NASA and a Turkish software company, respectively [51].
We used five NASA datasets, which share the same metric
set in the PROMISE repository [35, 45]. We used cleaned
NASA datasets (DS
0
version) [45]. For the SOFTLAB group,
we used all SOFTLAB datasets in the PROMISE reposi-
tory [35]. The metrics used in both NASA and SOFTLAB
groups are Halstead and McCabe’s cyclomatic metrics but
NASA has additional complexity metrics such as parameter
count and percentage of comments [35].
Predicting defects is conducted across different dataset
groups. For example, we build a prediction model by Apache
in ReLink and tested the model on velocity-1.4 in MORPH
(Apachevelocity-1.4).
1
We did not conduct defect prediction across projects in the
same group where datasets have the same metric set since
the focus of our study is on prediction across datasets with
heterogeneous metric sets. In total, we have 600 possible
prediction combinations from these 28 datasets.
4.3 Matching Score Cutoff Thresholds
To build HDP models, we apply various cutoff thresholds
for matching scores to observe how prediction performance
varies according to different cutoff values. Matched metrics
by analyzers have their own matching scores as explained in
Section 3. We apply different cutoff values (0.05 and 0.10,
0.20,. . . ,0.90) for the HDP models. If a matching score cut-
off is 0.50, we remove matched metrics with the matching
score 0.50 and build a prediction model with matched
metrics with the score > 0.50. The number of matched met-
rics varies by each prediction combination. For example,
when using KSAnalyzer with the cutoff of 0.05, the number
of matched metrics is four in cm1ar5 while that is one
in ar6pc3. The average number of matched metrics also
varies by analyzers and cutoff values; 4 (PAnalyzer), 2 (KS-
Analyzer), and 5 (SCoAnalyzer) in the cutoff of 0.05 but 1
(PAnalyzer), 1 (KSAnalyzer), and 4 (SCoAnalyzer) in the
cutoff of 0.90 on average.
4.4 Baselines
We compare HDP to three baselines: WPDP (Baseline1),
CPDP using common metrics (CPDP-CM) between source
and target datasets (Baseline2), and CPDP-IFS (Baseline3).
We first compare HDP to WPDP. Comparing HDP to
WPDP will provide empirical evidence of whether our HDP
models are applicable in practice.
We conduct CPDP using only common metrics (CPDP-
CM) between source and target datasets as in previous CPDP
studies [18, 29, 51]. For example, AEEEM and MORPH
have OO metrics as common metrics so we use them to
build prediction models for datasets between AEEEM and
MORPH. Since using common metrics has been adopted to
address the limitation on heterogeneous metric sets in previ-
ous CPDP studies [18, 29, 51], we set CPDP-CM as a base-
line to evaluate our HDP models. The number of matched
metrics varies across the dataset group. Between AEEEM
1
Hereafter a rightward arrow () denotes a prediction com-
bination.

Citations
More filters
Journal ArticleDOI

A survey of transfer learning

TL;DR: This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied toTransfer learning, which can be applied to big data environments.
Proceedings ArticleDOI

Automatically learning semantic features for defect prediction

TL;DR: This paper proposes to leverage a powerful representation-learning algorithm, deep learning, to learn semantic representation of programs automatically from source code, using Deep Belief Network to automatically learn semantic features from token vectors extracted from programs' Abstract Syntax Trees.
Journal ArticleDOI

Analysis of transfer learning for deep neural network based plant classification models

TL;DR: This experimental study demonstrates that transfer learning can provide important benefits for automated plant identification and can improve low-performance plant classification models.
Journal ArticleDOI

A survey on heterogeneous transfer learning

TL;DR: This paper contributes a comprehensive survey and analysis of current methods designed for performing heterogeneous transfer learning tasks to provide an updated, centralized outlook into current methodologies.
Proceedings ArticleDOI

Cross-project defect prediction using a connectivity-based unsupervised classifier

TL;DR: In the cross-project setting, the proposed connectivity-based unsupervised classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers and five un supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree), while only random forest ranks in the first tier.
References
More filters
Journal ArticleDOI

The WEKA data mining software: an update

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Journal ArticleDOI

A Survey on Transfer Learning

TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Journal ArticleDOI

An introduction to variable and feature selection

TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Book ChapterDOI

Individual Comparisons by Ranking Methods

TL;DR: The comparison of two treatments generally falls into one of the following two categories: (a) a number of replications for each of the two treatments, which are unpaired, or (b) we may have a series of paired comparisons, some of which may be positive and some negative as mentioned in this paper.
Book

A metrics suite for object oriented design

TL;DR: This research addresses the needs for software measures in object-orientation design through the development and implementation of a new suite of metrics for OO design, and suggests ways in which managers may use these metrics for process improvement.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What have the authors stated for future works in "Heterogeneous defect prediction" ?

As future work, the authors will explore the feasibility of building various prediction and recommendation models using heterogeneous datasets. 

The authors can build a prediction model with defect data collected from a software project and predict defects in the same project, i. e. within-project defect prediction ( WPDP ). However, CPDP requires projects that have the same metric set, meaning the metric sets should be identical between projects. To address the limitation, the authors propose heterogeneous defect prediction ( HDP ) to predict defects across projects with heterogeneous metric sets. Their HDP approach conducts metric selection and metric matching to build a prediction model between projects with heterogeneous metric sets. Their empirical study on 28 subjects shows that about 68 % of predictions using their approach outperform or are comparable to WPDP with statistical significance. 

For CPDP-CM, CPDP-IFS, and HDP, the authors build a prediction model by using a source dataset and test the model on the same test splits used in WPDP. 

Since the performance of CPDP is usually very poor [59], researchers have proposed various techniques to improve CPDP [29, 37, 51, 54]. 

If software quality assurance teams can predict defects before releasing a software product, they can effectively allocate limited resources for quality control [36, 38, 43, 58]. 

To match source and target metrics, the authors measure the similarity of each source and target metric pair by using several existing methods such as percentiles, Kolmogorov-Smirnov Test, and Spearman’s correlation coefficient [30, 49]. 

Since the authors focus on the distribution or correlation of metric values when matching metrics, it is beneficial to be able to apply the HDP approach on datasets even in different granularity levels. 

In their experimental settings, HDP tends to work well with the learners based on the linear relationship between a metric and a label (bug-proneness). 

After applying the cutoff threshold, the authors used the maximum weighted bipartite matching [31] technique to select a group of matched metrics, whose sum of matching scores is highest, without duplicated metrics. 

Using this percentile comparison function, a matching score between source and target metrics is calculated by the following equation:Mij =9∑ k=1 Pij(10× k)9 (2), where Mij is a matching score between i-th source and j-th target metrics. 

Suppose that a simple model predicts that an instance is buggy when the metric value of the instance is more than 40 in the case of Figure 4.