Local versus Global Lessons for Defect Prediction and Effort Estimation

doi:10.1109/TSE.2012.83

Journal ArticleDOI

Local versus Global Lessons for Defect Prediction and Effort Estimation

Tim Menzies, +7 more

- 01 Jun 2013 -

IEEE Transactions on Software Engineerin...

- Vol. 39, Iss: 6, pp 822-834

TLDR

It is concluded that when researchers attempt to draw lessons from some historical data source, they should ignore any existing local divisions into multiple sources, cluster across all available data, then restrict the learning of lessons to the clusters from other sources that are nearest to the test data.

Abstract:

Existing research is unclear on how to generate lessons learned for defect prediction and effort estimation. Should we seek lessons that are global to multiple projects or just local to particular projects? This paper aims to comparatively evaluate local versus global lessons learned for effort estimation and defect prediction. We applied automated clustering tools to effort and defect datasets from the PROMISE repository. Rule learners generated lessons learned from all the data, from local projects, or just from each cluster. The results indicate that the lessons learned after combining small parts of different data sources (i.e., the clusters) were superior to either generalizations formed over all the data or local lessons formed from particular projects. We conclude that when researchers attempt to draw lessons from some historical data source, they should 1) ignore any existing local divisions into multiple sources, 2) cluster across all available data, then 3) restrict the learning of lessons to the clusters from other sources that are nearest to the test data.

Citations

PDF

Open Access

More filters

Software engineering economics

Barry Boehm

TL;DR: In this article, the authors provide an overview of economic analysis techniques and their applicability to software engineering and management, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.

...read moreread less

Journal ArticleDOI

A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction

Seyedrebvar Hosseini, +2 more

- 01 Feb 2019 -

IEEE Transactions on Software Engineerin...

TL;DR: CPDP is still a challenge and requires more research before trustworthy applications can take place and this work synthesises literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances.

...read moreread less

Journal ArticleDOI

Tuning for software analytics

Wei Fu, +2 more

- 01 Aug 2016 -

Information & Software Technology

TL;DR: This paper finds that it is no longer enough to just run a data miner and present the result without conducting a tuning optimization study, and that standard methods in software analytics need to change.

...read moreread less

Journal ArticleDOI

Studying just-in-time defect prediction using cross-project models

Yasutaka Kamei, +5 more

- 01 Oct 2016 -

Empirical Software Engineering

TL;DR: An empirical study on 11 open source projects finds that while JIT models rarely perform well in a cross-project context, their performance tends to improve when using approaches that select models trained using other projects that are similar to the testing project, and combine the models of several other projects to produce an ensemble model.

...read moreread less

Journal ArticleDOI

A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches

Steffen Herbold, +2 more

- 01 Sep 2018 -

IEEE Transactions on Software Engineerin...

TL;DR: A benchmark for CPDP is provided and it is determined that an approach proposed by Camargo Cruz and Ochimizu (2009) based on data standardization performs best and is always ranked among the statistically significant best results for all metrics and data sets.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Learning representations by back-propagating errors

David E. Rumelhart, +2 more

- 01 Jan 1988 -

Nature

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.

...read moreread less

Book

C4.5: Programs for Machine Learning

J. Ross Quinlan

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read moreread less

Programs for Machine Learning

Steven L. Salzberg, +1 more

TL;DR: In his new book, C4.5: Programs for Machine Learning, Quinlan has put together a definitive, much needed description of his complete system, including the latest developments, which will be a welcome addition to the library of many researchers and students.

...read moreread less

Book

Tabu Search

Fred Glover, +1 more

TL;DR: This book explores the meta-heuristics approach called tabu search, which is dramatically changing the authors' ability to solve a host of problems that stretch over the realms of resource planning, telecommunications, VLSI design, financial analysis, scheduling, spaceplanning, energy distribution, molecular engineering, logistics, pattern classification, flexible manufacturing, waste management,mineral exploration, biomedical analysis, environmental conservation and scores of other problems.

...read moreread less

Software engineering economics

Barry Boehm

TL;DR: In this paper, the authors provide an overview of economic analysis techniques and their applicability to software engineering and management, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.

...read moreread less