scispace - formally typeset
Open AccessJournal ArticleDOI

An application of Bayesian network for predicting object-oriented software maintainability

C. van Koten, +1 more
- 01 Jan 2006 - 
- Vol. 48, Iss: 1, pp 59-67
Reads0
Chats0
TLDR
The results suggest that the Bayesian network model can predict maintainability more accurately than the regression-based models for one system, and almost as accurately as the best regression- based model for the other system.
Abstract
As the number of object-oriented software systems increases, it becomes more important for organizations to maintain those systems effectively. However, currently only a small number of maintainability prediction models are available for object-oriented systems. This paper presents a Bayesian network maintainability prediction model for an object-oriented software system. The model is constructed using object-oriented metric data in Li and Henry's datasets, which were collected from two different object-oriented systems. Prediction accuracy of the model is evaluated and compared with commonly used regression-based models. The results suggest that the Bayesian network model can predict maintainability more accurately than the regression-based models for one system, and almost as accurately as the best regression-based model for the other system.

read more

Content maybe subject to copyright    Report

An Application of Bayesian Network for Predicting
Object-Oriented Software Maintainability
Chikako van Koten
Andrew Gray
The Information Science
Discussion Paper Series
Number 2005/02
March 2005
ISSN 1172-6024

University of Otago
Department of Information Science
The Department of Information Science is one of six departments that make up the
School of Business at the University of Otago. The department offers courses of study
leading to a major in Information Science within the BCom, BA and BSc degrees. In
addition to undergraduate teaching, the department is also strongly involved in post-
graduate research programmes leading to MCom, MA, MSc and PhD degrees. Re-
search projects in spatial information processing, connectionist-based information sys-
tems, software engineering and software development, information engineering and
database, software metrics, distributed information systems, multimedia information
systems and information systems security are particularly well supported.
The views expressed in this paper are not necessarily those of the department as a
whole. The accuracy of the information presented in this paper is the sole responsibil-
ity of the authors.
Copyright
Copyright remains with the authors. Permission to copy for research or teaching pur-
poses is granted on the condition that the authors and the Series are given due ac-
knowledgment. Reproduction in any form for purposes other than research or teach-
ing is forbidden unless prior written permission has been obtained from the authors.
Correspondence
This paper represents work to date and may not necessarily form the basis for the au-
thors’ final conclusions relating to this topic. It is likely, however, that the paper will ap-
pear in some form in a journal or in conference proceedings in the near future. The au-
thors would be pleased to receive correspondence in connection with any of the issues
raised in this paper, or for subsequent publication details. Please write directly to the
authors at the address provided below. (Details of final journal/conference publication
venues for these papers are also provided on the Department’s publications web pages:
http://www.otago.ac.nz/informationscience/pubs/). Any other correspondence con-
cerning the Series should be sent to the DPS Coordinator.
Department of Information Science
University of Otago
P O Box 56
Dunedin
NEW ZEALAND
Fax: +64 3 479 8311
email: dps@infoscience.otago.ac.nz
www: http://www.otago.ac.nz/informationscience/

An application of Baye sian network for
predicting object-oriented software
maintainability
C. van Koten
1
and A.R. Gray
Department of Information Science,University of Otago, P.O.Box 56, Dunedin,
New Zealand
Abstract
As the number of object-oriented software systems increases, it becomes more im-
portant for organizations to maintain those systems effec tively. However, currently
only a small number of maintainability prediction models are available for object-
oriented s ystem s. This paper presents a Bayesian network maintainability predic-
tion model for an object-oriented software system. The model is constructed using
object-oriented metric data in Li and Henry’s datasets, which were collected from
two different object- oriented systems. Prediction accuracy of the model is evaluated
and compared with commonly used regression-based models. The results suggest
that the Bayesian network model can predict maintainability more accurately than
the regression-based models for one system, and almost as accurately as the best
regression-based model for the other system.
Key words: Object-oriented systems, Maintainability, Bayesian network,
Regression tree, Regression
1 Introduction
It is arguable that many object-oriented (OO) software systems are currently
in use. It is also arguable that the growing popularity of OO programming lan-
guages, such as Java, as well as the increasing number of software development
tools supporting the Unified Modelling Language (UML), encourages more OO
systems to be developed at present and in the future. Hence it is important
1
Corresponding author. Tel.: +64-3-479-8142; fax: +64-3-479-8311.
E-mail address: ckoten@infoscience.otago.ac.nz
Preprint submitted to Elsevier Science 27 February 2005

that those systems are maintained effectively and efficiently. A software main-
tainability prediction model enables organizations to predict maintainability
of a software system and assists them with managing maintenance resource.
In addition, if an accurate maintainability prediction model is available for
a software system, a defensive design can be adopted. This would minimize,
or at least reduce future maintenance effort of the system. Maintainability
of a software system can be measured in different ways. In this paper, main-
tainability is measured as the number of changes made to the code during
a maintenance period. Alternatively, maintainability may be measured as ef-
fort to make those changes. When maintainability is measured as effort, the
predictive model is called a maintenance effort prediction mo del. It is unfortu-
nate that the number of software maintainability prediction models including
maintenance effort prediction models, is currently very small in the literature.
Programming an OO software system is different from programming a non-
OO system due to the concepts that are specific to the OO paradigm, for
example, objects, inheritance and encapsulation. This difference limits the ap-
plicability of well-known non-OO software effort prediction models, such as
COCOMO [3], to OO software effort prediction, as well as non-OO software
metrics, such as Function Points [1], to measuring the characteristics of OO
software systems [23]. Hence a number of new software metrics were proposed
specifically for OO systems. Some of those OO metrics were used to predict
maintainability of OO systems. Examples of the OO metrics are Chidamber
and Kemerer (C&K) metrics and Li and Henry (L&H) metrics [10,25]. It was
shown that the L&H metrics had a correlation with the number of changes
made to the code of the OO software system [25]. It was also shown that
multiple linear regression models consisting of the C&K, L&H and other OO
metrics were able to predict software maintenance effort for some OO systems
[17].
This paper constructs an OO software maintainability prediction model using
a technique known as Bayesian network [14,20,22]. This technique allows a user
to construct a predictive model based on Bayesian probability theory [12]. An
application of Bayesian network to Software Engineering is currently limited
to a small number of studies of development effort prediction [2,11,31,34] and
defect prediction [15,28]. However, Bayesian network can also be a promis-
ing new technique for OO software maintainability prediction. This is due to
the ability to explicitly represent uncertainty using probabilities, the ability
to incorporate existing human expert’s knowledge into empirical data, and
the ability to update the model when new information becomes available.
Hence this paper investigates a research problem of what prediction accuracy
a Bayesian network OO software maintainability prediction model can achieve.
The term prediction accuracy in this paper means how well a predictive model
constructed using known data can predict the outcomes of unknown data. The
Bayesian ne twork model’s prediction accuracy is evaluated using some accu-
2

racy measures, which are commonly found in the software effort prediction
literature [16,24]. Those measures are absolute residuals, the magnitude of
relative error (MRE) and pred measures. Then, the Bayesian network mo del’s
prediction accuracy is compared with regression-based models, namely, a re-
gression tree [4] model and two different types of multiple linear regression
models.
The structure of the reminder of this paper is as follows. Section 2 describes
the OO software datasets and the sampling method used. Section 3 describes
the B ayesian network OO software maintainability prediction model. This
is followed by Section 4, which describes the regression tree model and the
multiple linear regression models. Section 5 describes the prediction accuracy
measures used. Section 6 evaluates the Bayesian network model’s prediction
accuracy using those accuracy measures and compares it with the regression
tree model and multiple linear regression models. Finally Section 7 presents
conclusions and discussions about a direction of future studies.
2 OO software datasets
2.1 Characteristics of datasets
This paper uses OO software datasets published by Li and Henry [25]. The
datasets consist of five C&K metrics: DIT, NOC, RFC, LCOM and WMC, and
four L&H metrics: MPC, DAC, NOM and SIZE2, as well as SIZE1, which is a
traditional lines of code size metric. Those metric data were collected from a
total of 110 classes in two OO software systems: User Interface Management
System (UIMS) and Quality Evaluation System (QUES). The code was writ-
ten in Cl assi cal Ada
T M
. The UIMS and QUES datasets contain 39 classes
and 71 classes, respectively. Maintainability was measured in CHANGE met-
ric by counting the number of lines in the code, which were changed during
a three-year maintenance period. Neither UIMS nor QUES datasets contain
actual maintenance effort data. The de scription of each metric is given in
Table 1.
The descriptive statistics of the UIMS and QUES datasets are shown in Ta-
ble 2.
The Pearson’s correlation coefficients between CHANGE and each of the OO
metrics are shown in Table 3.
Table 3 shows that there is a significant correlation between CHANGE and the
OO metrics. However, Table 3 also shows that the correlations in the UIMS
3

Citations
More filters

Software engineering economics

Barry Boehm
TL;DR: In this article, the authors provide an overview of economic analysis techniques and their applicability to software engineering and management, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.
Proceedings ArticleDOI

A systematic review of software maintainability prediction and metrics

TL;DR: There is little evidence on the effectiveness of software maintainability prediction techniques and models, according to a systematic review of studies targeted at the software quality attribute of maintainability.
Journal ArticleDOI

Predicting object-oriented software maintainability using multivariate adaptive regression splines

TL;DR: This paper employs a novel exploratory modeling technique, multiple adaptive regression splines (MARS), to build software maintainability prediction models using the metric data collected from two different object-oriented systems, and suggests that for one system MARS can predict maintainability more accurately than the other four typical modeling techniques, and that for the other system MARs is as accurate as the best modeling technique.
Journal ArticleDOI

A Bayesian belief network for IT implementation decision support

TL;DR: This paper demonstrates how to create a BBN from real-world data on Information Technology implementations and displays the resulting BBN and describes how it can be incorporated into a DSS to support "what-if' analyses about Information Technology Implementations.
Proceedings ArticleDOI

Application of TreeNet in Predicting Object-Oriented Software Maintainability: A Comparative Study

TL;DR: This paper empirically investigates whether the TreeNet model yields improved prediction accuracy over the recently published object-oriented software maintainability prediction models: multivariate adaptive regression splines, multivariate linear regression, support vector regression, artificial neural network, and regression tree.
References
More filters
Book

Classification and regression trees

Leo Breiman
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

Software engineering economics

Barry Boehm
TL;DR: In this paper, the authors provide an overview of economic analysis techniques and their applicability to software engineering and management, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.
Book

A metrics suite for object oriented design

TL;DR: This research addresses the needs for software measures in object-orientation design through the development and implementation of a new suite of metrics for OO design, and suggests ways in which managers may use these metrics for process improvement.
Book

Bayesian networks and decision graphs

TL;DR: The book introduces probabilistic graphical models and decision graphs, including Bayesian networks and influence diagrams, and presents a thorough introduction to state-of-the-art solution and analysis algorithms.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What have the authors contributed in "An application of bayesian network for predicting object-oriented software maintainability" ?

This paper presents a Bayesian network maintainability prediction model for an object-oriented software system. The results suggest that the Bayesian network model can predict maintainability more accurately than the regression-based models for one system, and almost as accurately as the best regression-based model for the other system. 

Those findings have also confirmed that Bayesian network is indeed a useful modelling technique for software maintainability prediction, although further studies are required to realize the full potential as well as the limitation. This provides an interesting 16 direction for future studies. The results in this paper also suggest that the prediction accuracy of the Bayesian network model may vary depending on the characteristics of dataset and/or the prediction accuracy measure used. 

The Bayesian network model’s prediction accuracy is evaluated using some accu-2racy measures, which are commonly found in the software effort prediction literature [16,24]. 

After the batch learning, the network predicts the posterior probability distribution of CHANGE for each case in the corresponding test subset, by computing the joint probability distribution. 

The Med.Ab.Res. is chosen to be a measure of the central tendency because the residual distribution is usually skewed in software datasets. 

Approximately a two-third of the cases in each dataset is chosen by random sampling without replacement using a function provided in a statistical software package, SPSS 11.0. 

For the UIMS dataset, the Bayesian network model has achieved significantly better prediction accuracy than the regression tree model and the multiple linear regression models. 

This is due to the ability to explicitly represent uncertainty using probabilities, the ability to incorporate existing human expert’s knowledge into empirical data, and the ability to update the model when new information becomes available. 

The Wilcoxon signed-rank tests of the MRE values have also confirmed strong evidence that the Bayesian network model’s MMRE value is significantly lower and thus, better than those of the other models. 

From this point of view, Bayesian networks can be considered as a network of events connected by the probabilistic dependencies between them.