scispace - formally typeset
Search or ask a question

Showing papers in "Empirical Software Engineering in 1999"


Journal ArticleDOI
TL;DR: Comparing several methods of analogy-based software effort estimation with each other and also with a simple linear regression model shows that people are better than tools at selecting analogues for the data set used in this study.
Abstract: Conventional approaches to software cost estimation have focused on algorithmic cost models, where an estimate of effort is calculated from one or more numerical inputs via a mathematical model. Analogy-based estimation has recently emerged as a promising approach, with comparable accuracy to algorithmic methods in some studies, and it is potentially easier to understand and apply. The current study compares several methods of analogy-based software effort estimation with each other and also with a simple linear regression model. The results show that people are better than tools at selecting analogues for the data set used in this study. Estimates based on their selections, with a linear size adjustment to the analogue‘s effort value, proved more accurate than estimates based on analogues selected by tools, and also more accurate than estimates based on the simple regression model.

264 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed perspective-based usability inspection, which divides the large variety of usability issues along different perspectives and focuses each inspection session on one perspective, and conducted a controlled experiment to study its effectiveness, using a post test only control group experimental design, with 24 professionals as subjects.
Abstract: Inspection is a fundamental means of achieving software usability. Past research showed that the current usability inspection techniques were rather ineffective. We developed perspective-based usability inspection, which divides the large variety of usability issues along different perspectives and focuses each inspection session on one perspective. We conducted a controlled experiment to study its effectiveness, using a post-test only control group experimental design, with 24 professionals as subjects. The control group used heuristic evaluation, which is the most popular technique for usability inspection. The experimental design and the results are presented, which show that inspectors applying perspective-based inspection not only found more usability problems related to their assigned perspectives, but also found more overall problems. Perspective-based inspection was shown to be more effective for the aggregated results of multiple inspectors, finding about 30\% more usability problems for 3 inspectors. A management implication of this study is that assigning inspectors more specific responsibilities leads to higher performance. Internal and external threats to validity are discussed to help better interpret the results and to guide future empirical studies.

111 citations


Journal ArticleDOI
TL;DR: This paper summarizes the results of a working group at the Empirical Studies of Software Development and Evolution (ESSDE) workshop in Los Angeles in May 1999, and provides an overview of the existing research and results, future research directions, and important issues regarding the methodology of conducting empirical studies.
Abstract: Object-Oriented technologies are becoming pervasive in many software development organizations. However, many methods, processes, tools, or notations are being used without thorough evaluation. Empirical studies aim at investigating the performance of such technologies and the quality of the resulting object-oriented (OO) software products. In other words, the goal is to provide a scientific foundation to the engineering of OO software. This paper summarizes the results of a working group at the Empirical Studies of Software Development and Evolution (ESSDE) workshop in Los Angeles in May 1999. The authors of this paper took part in the working group and have all been involved with various aspects of empirical studies of OO software development. We therefore hope to achieve a good coverage of the current state of the art. We provide an overview of the existing research and results, future research directions, and important issues regarding the methodology of conducting empirical studies. In Section 2, we cover existing empirical studies and we relate them to the claims usually associated with OO development technologies. Section 3 describes in depth what we believe are important directions for research and, more concretely, precise research questions. Section 4 identifies what we think are important methodological points and strategies to answer these questions.

106 citations


Journal ArticleDOI
TL;DR: A benchmark for interpreting Kappa values is developed using data from ratings of 70 process instances collected from assessments of 19 different projects in 7 different organizations in Europe during the SPICE Trials (this is an international effort to empirically evaluate the emerging ISO/IEC 15504 International Standard for Software Process Assessment).
Abstract: Software process assessments are by now a prevalent tool for process improvement and contract risk assessment in the software industry. Given that scores are assigned to processes during an assessment, a process assessment can be considered a subjective measurement procedure. As with any subjective measurement procedure, the reliability of process assessments has important implications on the utility of assessment scores, and therefore the reliability of assessments can be taken as a criterion for evaluating an assessment‘s quality. The particular type of reliability of interest in this paper is interrater agreement. Thus far, empirical evaluations of the interrater agreement of assessments have used Cohen‘s Kappa coefficient. Once a Kappa value has been derived, the next question is ’’how good is it?‘‘ Benchmarks for interpreting the obtained values of Kappa are available from the social sciences and medical literature. However, the applicability of these benchmarks to the software process assessment context is not obvious. In this paper we develop a benchmark for interpreting Kappa values using data from ratings of 70 process instances collected from assessments of 19 different projects in 7 different organizations in Europe during the SPICE Trials (this is an international effort to empirically evaluate the emerging ISO/IEC 15504 International Standard for Software Process Assessment). The benchmark indicates that Kappa values below 0.45 are poor, and values above 0.62 constitute substantial agreement and should be the minimum aimed for. This benchmark can be used to decide how good an assessment‘s reliability is.

103 citations


Journal ArticleDOI
TL;DR: The Usability Problem Taxonomy (UPT) is presented, a taxonomic model in which usability problems detected in graphical user interfaces with textual components are classified from both an artifact and a task perspective.
Abstract: Although much can be gained by analyzing usability problems, there is no overall framework in which large sets of usability problems can be easily classified, compared, and analyzed. Current approaches to problem analysis that focus on identifying specific problem characteristics (such as severity or cost-to-fix) do provide additional information to the developer; however, they do not adequately support high-level (global) analysis. High-level approaches to problem analysis depend on the developer/ evaluator‘s ability to group problems, yet commonly used techniques for organizing usability problems are incomplete and /or provide inadequate information for problem correction. This paper presents the Usability Problem Taxonomy (UPT), a taxonomic model in which usability problems detected in graphical user interfaces with textual components are classified from both an artifact and a task perspective. The UPT was built empirically using over 400 usability problem descriptions collected on real-world development projects. The UPT has two components and contains 28 categories: 19 are in the artifact component and nine are in the task component. A study was conducted showing that problems can be classified reliably using the UPT. Techniques for high-level problem analysis are explored using UPT classification of a set of usability problems detected during an evaluation of a CASE tool. In addition, ways to augment or complement existing problem analysis strategies using UPT analysis are suggested. A summary of reports from two developers who have used the UPT in the workplace provides anecdotal evidence indicating that UPT classification has improved problem identification, reporting, analysis, and prioritization prior to correction.

67 citations


Journal ArticleDOI
TL;DR: The results raise questions about the accuracy of manually collected and analyzed PSP data, indicate that integrated tool support may be required for high quality PSP data analysis, and suggest that external measures should be used when attempting to evaluate the impact of the PSP upon programmer behavior and product quality.
Abstract: The Personal Software Process (PSP) is used by software engineers to gather and analyze data about their work Published studies typically use data collected using the PSP to draw quantitative conclusions about its impact upon programmer behavior and product quality However, our experience using PSP led us to question the quality of data both during collection and its later analysis We hypothesized that data quality problems can make a significant impact upon the value of PSP measures—significant enough to lead to incorrect conclusions regarding process improvement To test this hypothesis, we built a tool to automate the PSP and then examined 89 projects completed by ten subjects using the PSP manually in an educational setting We discovered 1539 primary errors and categorized them by type, subtype, severity, and age To examine the collection problem we looked at the 90 errors that represented impossible combinations of data and at other less concrete anomalies in Time Recording Logs and Defect Recording Logs To examine the analysis problem we developed a rule set, corrected the errors as far as possible, and compared the original and corrected data We found significant differences for measures such as yield and the cost-performance ratio, confirming our hypothesis Our results raise questions about the accuracy of manually collected and analyzed PSP data, indicate that integrated tool support may be required for high quality PSP data analysis, and suggest that external measures should be used when attempting to evaluate the impact of the PSP upon programmer behavior and product quality

66 citations


Journal ArticleDOI
TL;DR: A method for estimating the size, and consequently effort and duration, of object oriented software development projects, and an adaptation of traditional function points, called “Object Oriented Function Points”, to enable the measurement of objectoriented analysis and design specifications is presented.
Abstract: We present a method for estimating the size, and consequently effort and duration, of object oriented software development projects. Different estimates may be made in different phases of the development process, according to the available information. We define an adaptation of traditional function points, called ’’Object Oriented Function Points‘‘, to enable the measurement of object oriented analysis and design specifications. Tools have been constructed to automate the counting method. The novel aspect of our method is its flexibility. An organization can experiment with different counting policies, to find the most accurate predictors of size, effort, etc. in its environment. The method and preliminary results of its application in an industrial environment are presented and discussed.

59 citations


Journal ArticleDOI
TL;DR: This paper investigates the influence of various data set characteristics and the purpose of analysis on the effectiveness of four model-building techniques—three statistical methods and one neural network method.
Abstract: Whilst some software measurement research has been unquestionably successful, other research has struggled to enable expected advances in project and process management. Contributing to this lack of advancement has been the incidence of inappropriate or non-optimal application of various model-building procedures. This obviously raises questions over the validity and reliability of any results obtained as well as the conclusions that may have been drawn regarding the appropriateness of the techniques in question. In this paper we investigate the influence of various data set characteristics and the purpose of analysis on the effectiveness of four model-building techniques—three statistical methods and one neural network method. In order to illustrate the impact of data set characteristics, three separate data sets, drawn from the literature, are used in this analysis. In terms of predictive accuracy, it is shown that no one modeling method is best in every case. Some consideration of the characteristics of data sets should therefore occur before analysis begins, so that the most appropriate modeling method is then used. Moreover, issues other than predictive accuracy may have a significant influence on the selection of model-building methods. These issues are also addressed here and a series of guidelines for selecting among and implementing these and other modeling techniques is discussed.

52 citations


Journal ArticleDOI
TL;DR: This report summarises and builds on the results of the “Directions and Methodologies for Empirical Software Engineering Research” group discussion and agrees on a three-point plan for future work.
Abstract: This report summarises and builds on the results of the ’’Directions and Methodologies for Empirical Software Engineering Research‘‘ group discussion. In particular, we considered the strengths, weaknesses, opportunities and threats to empirical software engineering research in light of the discussions and presentations during the workshop. The following sections describe each of these aspects of our discussion in turn. In addition, to finalise our discussion we agreed on a three-point plan for future work.

38 citations


Journal ArticleDOI
TL;DR: It is found that module-order models give management more flexible reliability enhancement strategies than classification models, and in these case studies, yielded more accurate results than corresponding discriminant models.
Abstract: Software quality models can predict the quality of modules early enough for cost-effective prevention of problems. For example, software product and process metrics can be the basis for predicting reliability. Predicting the exact number of faults is often not necessary; classification models can identify fault-prone modules. However, such models require that ’’fault-prone‘‘ be defined before modeling, usually via a threshold. This may not be practical due to uncertain limits on the amount of reliability-improvement effort. In such cases, predicting the rank-order of modules is more useful. A module-order model predicts the rank-order of modules according to a quantitative quality factor, such as the number of faults. This paper demonstrates how module-order models can be used for classification, and compares them with statistical classification models. Two case studies of full-scale industrial software systems compared nonparametric discriminant analysis with module-order models. One case study examined a military command, control, and communications system. The other studied a large legacy telecommunications system. We found that module-order models give management more flexible reliability enhancement strategies than classification models, and in these case studies, yielded more accurate results than corresponding discriminant models.

38 citations


Journal ArticleDOI
TL;DR: An empirical study was performed to assess metrics developed to predict the likelihood of risk of failure of a project, using data collected during 50 architecture audits performed over a period of two years for large industrial telecommunications systems.
Abstract: Architecture audits are performed very early in the software development lifecycle, typically before low level design or code implementation has begun. An empirical study was performed to assess metrics developed to predict the likelihood of risk of failure of a project. The study used data collected during 50 architecture audits performed over a period of two years for large industrial telecommunications systems. The purpose of such a predictor was to identify at a very early stage, projects that were likely to be at high risk of failure. This would enable the project to take corrective action before significant resources had been expended using a problematic architecture. Detailed information about seven of the 50 projects is presented, and a discussion of how the proposed metric rated each of these projects is presented,. A comparison is made of the metric‘s evaluation and the assessment of the project made by reviewers during the review process.

Journal ArticleDOI
TL;DR: Working Group Report: ICSE'99 Workshop on Empirical Studies of Software Development and Evolution.
Abstract: Working Group Report: ICSE'99 Workshop on Empirical Studies of Software Development and Evolution

Journal ArticleDOI
TL;DR: This paper presents a systematic method for designing hypermedia that are easy to use for various types of users, along with its application to a specific case study.
Abstract: This paper presents a systematic method for designing hypermedia that are easy to use for various types of users, along with its application to a specific case study. The design phase is supported by the use of task models. We have identified criteria that indicate how information in task models can be used to identify links, design presentations, and structure the data of the hypermedia considered. Different types of users imply different task models and thus different hypermedia designs. We then show how the design obtained was evaluated using both empirical testing and metrics for hypermedia navigation. We discuss the results obtained by these two evaluation methods and how they affected the original design.

Journal ArticleDOI
TL;DR: There were many issues that were central to successful evolution and the working group investigating the issues of empirical studies for evolving systems concluded that this is a very important area within software engineering.
Abstract: This paper describes the results of the working group investigating the issues of empirical studies for evolving systems. The groups found that there were many issues that were central to successful evolution and this concluded that this is a very important area within software engineering. Finally nine main areas were selected for consideration. For each of these areas the central issues were identified as well as success factors. In some cases success stories were also described and the critical factors accounting for the success analysed. In some cases it was later found that a number of areas were so tightly coupled that it was important to discuss them together.

Journal ArticleDOI
TL;DR: The initial estimates of fault introduction rates can serve as a baseline against which future projects can be compared to determine whether progress is being made in reducing the fault introduction rate, and to identify those development techniques that seem to provide the greatest reduction.
Abstract: In any manufacturing environment, the fault introduction rate might be considered one of the most meaningful criterion to evaluate the goodness of the development process. In many investigations, the estimates of such a rate are often oversimplified or misunderstood generating unrealistic expectations on the prediction power of regression models with a fault criterion. The computation of fault introduction rates in software development requires accurate and consistent measurement, which translates into demanding parallel efforts for the development organization. This paper presents the techniques and mechanisms that can be implemented in a software development organization to provide a consistent method of anticipating fault content and structural evolution across multiple projects over time. The initial estimates of fault introduction rates can serve as a baseline against which future projects can be compared to determine whether progress is being made in reducing the fault introduction rate, and to identify those development techniques that seem to provide the greatest reduction.

Journal ArticleDOI
TL;DR: This paper presents a specific technique for instrumenting components in a distributed system and constructs a wrapper around the component being measured that mimics the interface of the component that it is wrapping.
Abstract: The Common Object Request Broker Architecture (CORBA) supports the creation of distributed systems that cross processor, language and paradigm boundaries. These systems can be large and complex entities that consume considerable resources in their creation and execution. Measurements of characteristics of software systems is an important area of study in general and of particular interest for distributed systems. In this paper, we present a specific technique for instrumenting components in a distributed system. The technique constructs a wrapper around the component being measured. The wrapper monitors interactions with the ORB (Object Request Broker) and other components. Each wrapper mimics the interface of the component that it is wrapping so that the remaining objects in the system do not need modification. Two approaches to wrapping the component are presented and contrasted. The result is an efficient and modular technique that can quickly be applied to a component.


Journal ArticleDOI
TL;DR: What is meant in the software industry by process improvement and how it can and should be able to use empirical studies to improve software processes are examined.
Abstract: Empirical studies can play a multitude of roles in what is loosely called process improvement. In this paper we examine what is meant in the software industry by process improvement and how we can and should be able to use empirical studies to improve software processes. This paper evolved from discussions at the Empirical Studies in Software Development and Evolution Workshop at ICSE99.

Journal ArticleDOI
TL;DR: This Special Issue on Usability Engineering is meant to build a bridge between the software engineering (SE) and the human-computer interaction (HCI) research communities by fostering widespread inclusion of usability engineering practices in development by empirical studies validating these practices.
Abstract: This Special Issue on Usability Engineering is meant to build a bridge between the software engineering (SE) and the human-computer interaction (HCI) research communities. Usability engineering is emerging as the commercial and practical discipline that incorporates participatory design, rapid prototyping, and iterative evaluation. User-centered design and usability evaluations have become common practices in many organizations, but in the majority of software development shops they are still novel and not routinely practiced. Typical software engineering development cycles do not accommodate these practices. The Software Engineering Institute which influences government and commercial practice and education rarely mentions usability or user interface design methods. So the question remains: How can usability engineering practices become part of the mainstream of software engineering? We believe that widespread inclusion of usability engineering practices in development will be fostered by empirical studies validating these practices.

Journal ArticleDOI
TL;DR: There were many issues that were central to successful evolution and the working group investigating the issues of empirical studies for evolving systems concluded that this is a very important area within software engineering.
Abstract: This paper describes the results of the working group investigating the issues of empirical studies for evolving systems. The groups found that there were many issues that were central to successful evolution and this concluded that this is a very important area within software engineering. Finally nine main areas were selected for consideration. For each of these areas the central issues were identified as well as success factors. In some cases success stories were also described and the critical factors accounting for the success analysed In some cases it was later found that a number of areas were so tightly coupled that it was important to discuss them together.