Heterogeneous defect prediction
read more
Citations
A survey of transfer learning
Automatically learning semantic features for defect prediction
Analysis of transfer learning for deep neural network based plant classification models
A survey on heterogeneous transfer learning
Cross-project defect prediction using a connectivity-based unsupervised classifier
References
The WEKA data mining software: an update
A Survey on Transfer Learning
An introduction to variable and feature selection
Individual Comparisons by Ranking Methods
A metrics suite for object oriented design
Related Papers (5)
Data Mining Static Code Attributes to Learn Defect Predictors
Frequently Asked Questions (11)
Q2. What are the contributions mentioned in the paper "Heterogeneous defect prediction" ?
The authors can build a prediction model with defect data collected from a software project and predict defects in the same project, i. e. within-project defect prediction ( WPDP ). However, CPDP requires projects that have the same metric set, meaning the metric sets should be identical between projects. To address the limitation, the authors propose heterogeneous defect prediction ( HDP ) to predict defects across projects with heterogeneous metric sets. Their HDP approach conducts metric selection and metric matching to build a prediction model between projects with heterogeneous metric sets. Their empirical study on 28 subjects shows that about 68 % of predictions using their approach outperform or are comparable to WPDP with statistical significance.
Q3. How many test splits are used for a prediction model?
For CPDP-CM, CPDP-IFS, and HDP, the authors build a prediction model by using a source dataset and test the model on the same test splits used in WPDP.
Q4. What are the main reasons why researchers have proposed different techniques to improve CPDP?
Since the performance of CPDP is usually very poor [59], researchers have proposed various techniques to improve CPDP [29, 37, 51, 54].
Q5. What is the main reason why software quality assurance teams can predict defects before releasing a software?
If software quality assurance teams can predict defects before releasing a software product, they can effectively allocate limited resources for quality control [36, 38, 43, 58].
Q6. How do the authors measure the similarity between source and target metrics?
To match source and target metrics, the authors measure the similarity of each source and target metric pair by using several existing methods such as percentiles, Kolmogorov-Smirnov Test, and Spearman’s correlation coefficient [30, 49].
Q7. What is the way to compare the HDP approach to CPDP?
Since the authors focus on the distribution or correlation of metric values when matching metrics, it is beneficial to be able to apply the HDP approach on datasets even in different granularity levels.
Q8. What is the way to test HDP with other machine learners?
In their experimental settings, HDP tends to work well with the learners based on the linear relationship between a metric and a label (bug-proneness).
Q9. What is the cutoff threshold for a group of matched metrics?
After applying the cutoff threshold, the authors used the maximum weighted bipartite matching [31] technique to select a group of matched metrics, whose sum of matching scores is highest, without duplicated metrics.
Q10. What is the match between source and target metrics?
Using this percentile comparison function, a matching score between source and target metrics is calculated by the following equation:Mij =9∑ k=1 Pij(10× k)9 (2), where Mij is a matching score between i-th source and j-th target metrics.
Q11. What is the way to predict a buggy instance?
Suppose that a simple model predicts that an instance is buggy when the metric value of the instance is more than 40 in the case of Figure 4.