scispace - formally typeset
Search or ask a question

Showing papers in "Empirical Software Engineering in 1996"


Journal ArticleDOI
TL;DR: It is demonstrated that following the approach presented may lead to violations of the strict prescriptions and proscriptions of measurement theory, but that in practical terms these violations would have diminished consequences, especially when compared to the advantages afforded to the practicing researcher.
Abstract: Elements of measurement theory have recently been introduced into the software engineering discipline. It has been suggested that these elements should serve as the basis for developing, reasoning about, and applying measures. For example, it has been suggested that software complexity measures should be additive, that measures fall into a number of distinct types (i.e., levels of measurement: nominal, ordinal, interval, and ratio), that certain statistical techniques are not appropriate for certain types of measures (e.g., parametric statistics for less-than-interval measures), and that certain transformations are not permissible for certain types of measures (e.g., non-linear transformations for interval measures). In this paper we argue that, inspite of the importance of measurement theory, and in the context of software engineering, many of these prescriptions and proscriptions are either premature or, if strictly applied, would represent a substantial hindrance to the progress of empirical research in software engineering. This argument is based partially on studies that have been conducted by behavioral scientists and by statisticians over the last five decades. We also present a pragmatic approach to the application of measurement theory in software engineering. While following our approach may lead to violations of the strict prescriptions and proscriptions of measurement theory, we demonstrate that in practical terms these violations would have diminished consequences, especially when compared to the advantages afforded to the practicing researcher.

197 citations


Journal ArticleDOI
TL;DR: Findings are not at all obvious that object-oriented software is going to be more maintainable in the long run, but they are sufficiently important that attempts to verify the results should be made by independent researchers.
Abstract: This empirical research was undertaken as part of a multi-method programme of research to investigate unsupported claims made of object-oriented technology. A series of subject-based laboratory experiments, including an internal replication, tested the effect of inheritance depth on the maintainability of object-oriented software. Subjects were timed performing identical maintenance tasks on object-oriented software with a hierarchy of three levels of inheritance depth and equivalent object-based software with no inheritance. This was then replicated with more experienced subjects. In a second experiment of similar design, subjects were timed performing identical maintenance tasks on object-oriented software with a hierarchy of five levels of inheritance depth and the equivalent object-based software. The collected data showed that subjects maintaining object-oriented software with three levels of inheritance depth performed the maintenance tasks significantly quicker than those maintaining equivalent object-based software with no inheritance. In contrast, subjects maintaining the object-oriented software with five levels of inheritance depth took longer, on average, than the subjects maintaining the equivalent object-based software (although statistical significance was not obtained). Subjects' source code solutions and debriefing questionnaires provided some evidence suggesting subjects began to experience difficulties with the deeper inheritance hierarchy. It is not at all obvious that object-oriented software is going to be more maintainable in the long run. These findings are sufficiently important that attempts to verify the results should be made by independent researchers.

153 citations


Journal ArticleDOI
TL;DR: The characterization scheme is used to structure a detailed survey of four experiments that compared reading and testing techniques for detecting defects in source code and it is expected the software engineering community will gain quantitative insights about the utility of defect-detection techniques in different environments.
Abstract: Techniques for detecting defects in source code are fundamental to the success of any software development approach. A software development organization therefore needs to understand the utility of techniques such as reading or testing in its own environment. Controlled experiments have proven to be an effective means for evaluating software engineering techniques and gaining the necessary understanding about their utility. This paper presents a characterization scheme for controlled experiments that evaluate defect-detection techniques. The characterization scheme permits the comparison of results from similar experiments and establishes a context for cross-experiment analysis of those results. The characterization scheme is used to structure a detailed survey of four experiments that compared reading and testing techniques for detecting defects in source code. We encourage educators, researchers, and practitioners to use the characterization scheme in order to develop and conduct further instances of this class of experiments. By repeating this experiment we expect the software engineering community will gain quantitative insights about the utility of defect-detection techniques in different environments.

84 citations


Journal ArticleDOI
TL;DR: A study carried out within a software development organization to evaluate the use of function points as a measure of early lifecycle software size found that the function point metric revealed a much lower productivity in the client server environment.
Abstract: This paper reports on a study carried out within a software development organization to evaluate the use of function points as a measure of early lifecycle software size. There were three major aims to the research: firstly to determine the extent to which the component elements of function points were independent of each other and thus appropriate for an additive model of size; secondly to investigate the relationship between effort and (1) the function point components, (2) unadjusted function points, and (3) adjusted function points, to determine whether the complexity weightings and technology adjustments were adding to the effort explanation power of the metric; and thirdly to investigate the suitability of function points for sizing in client server developments. The results show that the component parts are not independent of each other which supports an earlier study in this area. In addition the complexity weights and technology factors do not improve the effort/size model, suggesting that a simplified sizing metric may be appropriate. With respect to the third aim it was found that the function point metric revealed a much lower productivity in the client server environment. This likely is a reflection of cost of the introduction of newer technologies but is in need of further research.

52 citations


Journal ArticleDOI
TL;DR: Focusing on the software maintenance phase, this study demonstrated that reuse data can significantly improve the predictive accuracy of software quality models.
Abstract: This paper presents a case study of a software project in the maintenance phase The case study was based on a sample of modules, representing about 13 million lines of code, from a very large telecommunications system Software quality models were developed to predict the number of faults expected from the coding through operations phases Since modules from the prior release were often reused to develop a new release, one model incorporated reuse data as additional independent variables We compare this model's performance to a similar model without reuse data Software quality models often have product metrics as the only input data for predicting quality There is an implicit assumption that all the modules have had a similar development history, so that product attributes are the primary drivers of different quality levels Reuse of software as components and software evolution do not fit this assumption very well, and consequently, traditional models for such environments may not have adequate accuracy Focusing on the software maintenance phase, this study demonstrated that reuse data can significantly improve the predictive accuracy of software quality models

27 citations


Journal ArticleDOI
TL;DR: A research study whose objective was to develop an instrument to measure the success of the requirements engineering process is described, and evidence is presented demonstrating that the instrument has desirable psychometric properties, such as high reliability and good validity.
Abstract: There exists a strong motivation for evaluating, understanding, and improving requirements engineering practices given that a successful requirements engineering process is necessary for a successful software system. Measuring requirements engineering success is central to evaluation, understanding, and improving these practices. In this paper, a research study whose objective was to develop an instrument to measure the success of the requirements engineering process is described. The domain of this study is developing customer-specific business information systems. The main result is a subjective instrument for measuring requirements engineering success. The instrument consists of 32 indicators that cover the two most important dimensions of requirements engineering success. These two dimensions were identified during the study to be: quality of requirements engineering products and quality of requirements engineering service. Evidence is presented demonstrating that the instrument has desirable psychometric properties, such as high reliability and good validity.

26 citations


Journal ArticleDOI
TL;DR: This investigation will examine the static complexity of a program together with the three dynamic measures of functional, fractional, and operational complexity; the eminent value of the dynamic metrics is shown in their role as measures of test outcomes.
Abstract: This paper presents an investigation into four distinct aspects of software complexity. An initial partition of the software complexity domain would be the attributes of static software complexity and those of dynamic software comlexity. Static complexity measurement views all program modules monolithically. That is, all of the code for all of the modules is measured as extracted from source code files. When computer software is actually executed, not all modules are executed to the same extent. Some receive a large proportion of execution activity. Further, when these modules execute, not all code in the modules executes. If just the code that is executed is measured for complexity a completely different view of the program module emerges. In this investigation we will examine the static complexity of a program together with the three dynamic measures of functional, fractional, and operational complexity. The eminent value of the dynamic metrics is shown in their role as measures of test outcomes.

14 citations


Journal ArticleDOI
TL;DR: The trend to reduce developer testing and increasingly rely upon inspection techniques and independent functional testing to shorten the development life cycle, improve testing productivity, and improve software quality is investigated.
Abstract: Pressure to compress the development life cycle and reduce the duration and resources committed to testing lead to experimentation in testing at the NASA Goddard Space Flight Centeris Software Engineering Laboratory. This study investigates the trend to reduce developer testing and increasingly rely upon inspection techniques and independent functional testing to shorten the development life cycle, improve testing productivity, and improve software quality.

8 citations


Journal ArticleDOI
TL;DR: The Ada programs to require more lines of code than their functionally equivalent FORTRAN counterparts, but it is observed that the overhead for Ada diminishes as program size increases, and some of the reasons for these economies of scale when using Ada are explored.
Abstract: This paper presents the results of a study comparing pairs of functionally equivalent programs written in the FORTRAN and Ada languages We found the Ada programs to require more lines of code than their functionally equivalent FORTRAN counterparts However, we also observed that the overhead for Ada diminishes as program size increases Our limited data suggested that there may be a cross-over point beyond which the size of an Ada program would be smaller than a functionally equivalent FORTRAN program We explore some of the reasons for these economies of scale when using Ada The implications of our findings on software cost estimating are also discussed

4 citations