Cross-Project and Within-Project Semisupervised Software Defect Prediction: A Unified Approach

doi:10.1109/TR.2018.2804922

Journal ArticleDOI

Cross-Project and Within-Project Semisupervised Software Defect Prediction: A Unified Approach

Fei Wu, +6 more

- 21 Mar 2018 -

IEEE Transactions on Reliability

- Vol. 67, Iss: 2, pp 581-597

Chats0

TLDR

A unified and effective solution for both CSDP and WSDP problems is provided and a cost-sensitive kernelized semisupervised dictionary learning (CKSDL) approach is proposed that outperforms state-of-the-art WSDP methods, using unlabeled cross-project defect data can help improve the WSDP performance, and CKSDL generally obtains significantly better prediction performance than related SSDP methods in the CSDP scenario.

Abstract:

When there exist not enough historical defect data for building an accurate prediction model, semisupervised defect prediction (SSDP) and cross-project defect prediction (CPDP) are two feasible solutions. Existing CPDP methods assume that the available source data are well labeled. However, due to expensive human efforts for labeling a large amount of defect data, usually, we can only utilize the suitable unlabeled source data. We call CPDP in this scenario as cross-project semisupervised defect prediction (CSDP). Although some within-project semisupervised defect prediction (WSDP) methods have been developed in recent years, there still exists much room for improvement on prediction performance. In this paper, we aim to provide a unified and effective solution for both CSDP and WSDP problems. We introduce the semisupervised dictionary learning technique and propose a cost-sensitive kernelized semisupervised dictionary learning (CKSDL) approach. CKSDL can make full use of the limited labeled defect data and a large amount of unlabeled data in the kernel space. In addition, CKSDL considers the misclassification costs in the dictionary learning process. Extensive experiments on 16 projects indicate that CKSDL outperforms state-of-the-art WSDP methods, using unlabeled cross-project defect data can help improve the WSDP performance, and CKSDL generally obtains significantly better prediction performance than related SSDP methods in the CSDP scenario.

Cross-Project and Within-Project Semisupervised Software Defect Prediction: A Unified Approach

Citations

Software Defect Prediction via Attention-Based Recurrent Neural Network

Seml: A Semantic LSTM Model for Software Defect Prediction

Revisiting Supervised and Unsupervised Methods for Effort-Aware Cross-Project Defect Prediction

Effort-aware and just-in-time defect prediction with neural network.

Multiview Transfer Learning for Software Defect Prediction

References

Statistical Comparisons of Classifiers over Multiple Data Sets

Robust Face Recognition via Sparse Representation

Face recognition using eigenfaces

Kernel Principal Component Analysis

A Systematic Literature Review on Fault Prediction Performance in Software Engineering

Related Papers (5)

Towards identifying software project clusters with regard to defect prediction

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

An investigation on the feasibility of cross-project defect prediction

Transfer defect learning

On the relative value of cross-company and within-company data for defect prediction