scispace - formally typeset
Journal ArticleDOI

Transfer learning for cross-company software defect prediction

Reads0
Chats0
TLDR
This paper considers the cross-company defect prediction scenario where source and target data are drawn from different companies, and proposes a novel algorithm called Transfer Naive Bayes (TNB), which is more accurate in terms of AUC, within less runtime than the state of the art methods.
Abstract
Context: Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data in practice, because of the lack of these local data repositories. Recently, transfer learning has attracted more and more attention for building classifier in target domain using the data from related source domain. It is very useful in cases when distributions of training and test instances differ, but is it appropriate for cross-company software defect prediction? Objective: In this paper, we consider the cross-company defect prediction scenario where source and target data are drawn from different companies. In order to harness cross company data, we try to exploit the transfer learning method to build faster and highly effective prediction model. Method: Unlike the prior works selecting training data which are similar from the test data, we proposed a novel algorithm called Transfer Naive Bayes (TNB), by using the information of all the proper features in training data. Our solution estimates the distribution of the test data, and transfers cross-company data information into the weights of the training data. On these weighted data, the defect prediction model is built. Results: This article presents a theoretical analysis for the comparative methods, and shows the experiment results on the data sets from different organizations. It indicates that TNB is more accurate in terms of AUC (The area under the receiver operating characteristic curve), within less runtime than the state of the art methods. Conclusion: It is concluded that when there are too few local training data to train good classifiers, the useful knowledge from different-distribution training data on feature level may help. We are optimistic that our transfer learning method can guide optimal resource allocation strategies, which may reduce software testing cost and increase effectiveness of software testing process.

read more

Citations
More filters
Journal ArticleDOI

Transfer learning using computational intelligence

TL;DR: This paper systematically examines computational intelligence-based transfer learning techniques and clusters related technique developments into four main categories and provides state-of-the-art knowledge that will directly support researchers and practice-based professionals to understand the developments in computational Intelligence- based transfer learning research and applications.
Journal ArticleDOI

A systematic review of machine learning techniques for software fault prediction

TL;DR: The machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers, however, the application of theMachine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results.
Journal ArticleDOI

An Empirical Comparison of Model Validation Techniques for Defect Prediction Models

TL;DR: It is found that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques, and out-of-sample bootstrap validation yields the best balance between the bias and variance.
Proceedings ArticleDOI

Transfer defect learning

TL;DR: A state-of-the-art transfer learning approach is applied to make feature distributions in source and target projects similar, and a novel transfer defect learning approach, TCA+, is proposed, by extending TCA.
References
More filters
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI

A Survey on Transfer Learning

TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Journal ArticleDOI

An introduction to ROC analysis

TL;DR: The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.
Book ChapterDOI

Individual Comparisons by Ranking Methods

TL;DR: The comparison of two treatments generally falls into one of the following two categories: (a) a number of replications for each of the two treatments, which are unpaired, or (b) we may have a series of paired comparisons, some of which may be positive and some negative as mentioned in this paper.
Related Papers (5)