scispace - formally typeset
Search or ask a question

Showing papers in "Empirical Software Engineering in 2005"


Journal ArticleDOI
TL;DR: The infrastructure that is being designed and constructed to support controlled experimentation with testing and regression testing techniques is described and the impact that this infrastructure has had and can be expected to have.
Abstract: Where the creation, understanding, and assessment of software testing and regression testing techniques are concerned, controlled experimentation is an indispensable research methodology. Obtaining the infrastructure necessary to support such experimentation, however, is difficult and expensive. As a result, progress in experimentation with testing techniques has been slow, and empirical data on the costs and effectiveness of techniques remains relatively scarce. To help address this problem, we have been designing and constructing infrastructure to support controlled experimentation with testing and regression testing techniques. This paper reports on the challenges faced by researchers experimenting with testing techniques, including those that inform the design of our infrastructure. The paper then describes the infrastructure that we are creating in response to these challenges, and that we are now making available to other researchers, and discusses the impact that this infrastructure has had and can be expected to have.

1,114 citations


Journal ArticleDOI
TL;DR: A taxonomy of techniques is provided, focusing on those for data collection, organized according to the degree of human intervention each requires, and a discussion of how to use it effectively is provided.
Abstract: Software engineering is an intensively people-oriented activity, yet too little is known about how designers, maintainers, requirements analysts and all other types of software engineers perform their work. In order to improve software engineering tools and practice, it is therefore essential to conduct field studies, i.e. to study real practitioners as they solve real problems. To do so effectively, however, requires an understanding of the techniques most suited to each type of field study task. In this paper, we provide a taxonomy of techniques, focusing on those for data collection. The taxonomy is organized according to the degree of human intervention each requires. For each technique, we provide examples from the literature, an analysis of some of its advantages and disadvantages, and a discussion of how to use it effectively. We also briefly talk about field study design in general, and data analysis.

481 citations


Journal ArticleDOI
Judith Segal1
TL;DR: It is argued that the rich picture painted by the case study, and the reflections on methodology that it inspires, has a relevance that reaches beyond the original context of the study.
Abstract: This paper describes a case study of software engineers developing a library of software components for a group of research scientists, using a traditional, staged, document-led methodology. The case study reveals two problems with the use of the methodology. The first is that it demands an upfront articulation of requirements, whereas the scientists had experience, and hence expectations, of emergent requirements; the second is that the project documentation does not suffice to construct a shared understanding. Reflecting on our case study, we discuss whether combining agile elements with a traditional methodology might have alleviated these problems. We then argue that the rich picture painted by the case study, and the reflections on methodology that it inspires, has a relevance that reaches beyond the original context of the study.

116 citations


Journal ArticleDOI
TL;DR: The results show that the documentation within corrective maintenance is still a very neglected issue within the organisations studied, and none of the authors' organisations has fully implemented all their documentation requirements.
Abstract: The purpose of documentation is to describe software systems and software processes. Consistent, correct and complete documentation of a software system is an important vehicle for the maintainer to gain its understanding, to ease its learning and/or relearning processes, and to make the system more maintainable. Poor system documentation, on the other hand, is the primary reason for quick software system quality degradation and ageing. Proper process documentation records the process, its stages and tasks, executing roles, their decisions and motivations, and the results of each individual process task. It is extremely important for achieving insight and visibility into the processes, important for their meaningful process measurement and thereby pivotal for achieving high process maturity. In this paper, we report on the results of an explorative study in which we have identified a number of rudimentary documentation requirements relevant within corrective maintenance, and found out how they were implemented within eighteen software organizations in Sweden. The goal was to examine the industrial documentation practice within corrective maintenance. Our results show that the documentation within corrective maintenance is still a very neglected issue within the organisations studied. None of our organisations has fully implemented all our documentation requirements.

110 citations


Journal ArticleDOI
TL;DR: This paper presents the results of developing and evaluating an artefact (specifically, a characterisation schema) to assist with testing technique selection and provides developers with a catalogue containing enough information for them to select the best suited techniques for a given project.
Abstract: One of the major problems within the software testing area is how to get a suitable set of cases to test a software system. This set should assure maximum effectiveness with the least possible number of test cases. There are now numerous testing techniques available for generating test cases. However, many are never used, and just a few are used over and over again. Testers have little (if any) information about the available techniques, their usefulness and, generally, how suited they are to the project at hand upon, which to base their decision on which testing techniques to use. This paper presents the results of developing and evaluating an artefact (specifically, a characterisation schema) to assist with testing technique selection. When instantiated for a variety of techniques, the schema provides developers with a catalogue containing enough information for them to select the best suited techniques for a given project. This assures that the decisions they make are based on objective knowledge of the techniques rather than perceptions, suppositions and assumptions.

85 citations


Journal ArticleDOI
TL;DR: Empirically determine to what extent high correlations and low ranges might be expected among CK metrics and suggest that a prediction system may not be based on the whole CK metrics suite, but only on a subset consisting of those metrics that do not present either high correlation or low ranges.
Abstract: The object-oriented metrics suite proposed by Chidamber and Kemerer (CK) is a measurement approach towards improved object-oriented design and development practices. However, existing studies evidence traces of collinearity between some of the metrics and low ranges of other metrics, two facts which may endanger the validity of models based on the CK suite. As high correlation may be an indicator of collinearity, in this paper, we empirically determine to what extent high correlations and low ranges might be expected among CK metrics. To draw as much general conclusions as possible, we extract the CK metrics from a large data set (200 public domain projects) and we apply statistical meta-analysis techniques to strengthen the validity of our results. Homogenously through the projects, we found a moderate (∼0.50) to high correlation (>0.80) between some of the metrics and low ranges of other metrics. Results of this empirical analysis supply researchers and practitioners with three main advises: a) to avoid the use in prediction systems of CK metrics that have correlation more than 0.80 b) to test for collinearity those metrics that present moderate correlations (between 0.50 and 0.60) c) to avoid the use as response in continuous parametric regression analysis of the metrics presenting low variance. This might therefore suggest that a prediction system may not be based on the whole CK metrics suite, but only on a subset consisting of those metrics that do not present either high correlation or low ranges.

75 citations


Journal ArticleDOI
TL;DR: An investigation into industrial practice of requirements management process improvement and its positive effects on downstream software development reveals a strong relationship between a well-defined requirements process and increased developer productivity, improved project planning through better estimations and enhanced ability for stakeholders to negotiate project scope.
Abstract: Requirements management is being recognized as one of the most important albeit difficult phases in software engineering. The literature repeatedly cites the role of well-defined requirements and requirements management process in problem analysis and project management as benefiting software development throughout the life cycle: during design, coding, testing, maintenance and documentation of software. This paper reports on the findings of an investigation into industrial practice of requirements management process improvement and its positive effects on downstream software development. The evidence reveals a strong relationship between a well-defined requirements process and increased developer productivity, improved project planning through better estimations and enhanced ability for stakeholders to negotiate project scope. These results are important since there is little empirical evidence of the actual benefits of sound requirements practice, in spite of the plethora of claims in the literature. An account of these effects not only adds to our understanding of good requirements practice but also provides strong motivation for software organizations to develop programs for improvement of their requirements processes.

63 citations


Journal ArticleDOI
TL;DR: The results suggest that OPM is better than UML in modeling the dynamics aspect of the Web applications and that the quality of the OPM models students built in the construction part was superior to that of the corresponding UML models.
Abstract: Object-Process Methodology (OPM), which is a holistic approach to modeling and evolving systems, views objects and processes as two equally important entities that describe the system's structure and behavior in a single model. Unified Modeling Language (UML), which is the standard object-oriented modeling language for software systems, separates the system model into various aspects, each of which is represented in a different view (diagram type). The exponential growth of the Web and the progress of Internet-based architectures have set the stage for the proliferation of a variety of Web applications, which are classified as hybrids between hypermedia and information systems. Such applications require a modeling approach that is capable of clearly specifying aspects of their architecture, communication, and distributive nature. Since UML and OPM are two candidates for this task, this study has been designed to establish the level of comprehension and the quality of the constructed Web application models using each one of these two approaches. In the experiment we carried out, third year undergraduate information systems engineering students were asked to respond to comprehension and construction questions about two representative Web application models. The comprehension questions related to the system's structure, dynamics, and distribution aspects. The results suggest that OPM is better than UML in modeling the dynamics aspect of the Web applications. In specifying structure and distribution aspects, there were no significant differences. The results further suggest that the quality of the OPM models students built in the construction part was superior to that of the corresponding UML models.

52 citations


Journal ArticleDOI
TL;DR: An analysis of actual web-development project data and results from an experiment suggest that people with technical competence provided less realistic project effort estimates than those with less technical competence, suggesting that more knowledge about how to implement a requirement specification does not always lead to better estimation performance.
Abstract: Estimating the effort required to complete web-development projects involves input from people in both technical (e.g., programming), and non-technical (e.g., user interaction design) roles. This paper examines how the employees' role and type of competence may affect their estimation strategy and performance. An analysis of actual web-development project data and results from an experiment suggest that people with technical competence provided less realistic project effort estimates than those with less technical competence. This means that more knowledge about how to implement a requirement specification does not always lead to better estimation performance. We discuss, amongst others, two possible reasons for this observation: (1) Technical competence induces a bottom-up, construction-based estimation strategy, while lack of this competence induces a more “outside” view of the project, using a top-down estimation strategy. An “outside” view may encourage greater use of the history of previous projects and reduce the bias towards over-optimism. (2) Software professionals in technical roles perceive that they are evaluated as more skilled when providing low effort estimates. A consequence of our findings is that the choice of estimation strategy, estimation evaluation criteria and feedback are important aspects to consider when seeking to improve estimation accuracy.

51 citations


Journal ArticleDOI
TL;DR: Two simple and commonly used imputation techniques: Class Mean Imputation and k Nearest Neighbors and k-NN coupled with two missingness mechanisms are examined, finding that CMI is the preferred technique since it is more accurate and the impact of missingness mechanism on imputation accuracy is not statistically significant.
Abstract: A very common problem when building software engineering models is dealing with missing data. To address this there exist a range of imputation techniques. However, selecting the appropriate imputation technique can also be a difficult problem. One reason for this is that these techniques make assumptions about the underlying missingness mechanism, that is how the missing values are distributed within the data set. It is compounded by the fact that, for small data sets, it may be very difficult to determine what is the missingness mechanism. This means there is a danger of using an inappropriate imputation technique. Therefore, it is necessary to determine what is the safest default assumption about the missingness mechanism for imputation techniques when dealing with small data sets. We examine experimentally, two simple and commonly used techniques: Class Mean Imputation (CMI) and k Nearest Neighbors (k-NN) coupled with two missingness mechanisms: missing completely at random (MCAR) and missing at random (MAR). We draw two conclusions. First, that for our analysis CMI is the preferred technique since it is more accurate. Second, and more importantly, the impact of missingness mechanism on imputation accuracy is not statistically significant. This is a useful finding since it suggests that even for small data sets we can reasonably make a weaker assumption that the missingness mechanism is MAR. Thus both imputation techniques have practical application for small software engineering data sets with missing values.

46 citations


Journal ArticleDOI
TL;DR: A new scheme for characterization of the level of confusion exhibited by projects based on an empirical questionnaire that is applicable to the detection of risky projects and concluded that the characterization of confused projects was successful.
Abstract: During software development, projects often experience risky situations. If projects fail to detect such risks, they may exhibit confused behavior. In this paper, we propose a new scheme for characterization of the level of confusion exhibited by projects based on an empirical questionnaire. First, we designed a questionnaire from five project viewpoints, requirements, estimates, planning, team organization, and project management activities. Each of these viewpoints was assessed using questions in which experience and knowledge of software risks are determined. Secondly, we classify projects into "confused" and "not confused," using the resulting metrics data. We thirdly analyzed the relationship between responses to the questionnaire and the degree of confusion of the projects using logistic regression analysis and constructing a model to characterize confused projects. The experimental result used actual project data shows that 28 projects out of 32 were characterized correctly. As a result, we concluded that the characterization of confused projects was successful. Furthermore, we applied the constructed model to data from other projects in order to detect risky projects. The result of the application of this concept showed that 7 out of 8 projects were classified correctly. Therefore, we concluded that the proposed scheme is also applicable to the detection of risky projects.

Journal ArticleDOI
TL;DR: An empirical study of a method that enables quantification of the perceived support different software architectures give for different quality attributes enables an informed decision of which architecture candidate best fit the mixture of quality attributes required by a system being designed.
Abstract: To sustain the qualities of a software system during evolution, and to adapt the quality attributes as the requirements evolve, it is necessary to have a clear software architecture that is understood by all developers and to which all changes to the system adheres. This software architecture can be created beforehand, but must also be updated to reflect changes in the domain, and hence the requirements of the software. The choice of which software architecture to use is typically based on informal decisions. There exist, to the best of our knowledge, little factual knowledge of which quality attributes are supported or obstructed by different architecture approaches. In this paper we present an empirical study of a method that enables quantification of the perceived support different software architectures give for different quality attributes. This in turn enables an informed decision of which architecture candidate best fit the mixture of quality attributes required by a system being designed.

Journal ArticleDOI
TL;DR: Control experiments conducted to investigate two approaches to applying use case models in an object-oriented design process show that the technique chosen for the transition from use cases to class diagrams affects the quality of the class diagrams, but also that the effects of the techniques depend on the categories of developer applying it and on the tool with which the technique is applied.
Abstract: Several approaches have been proposed for the transition from functional requirements to object-oriented design. In a use case-driven development process, the use cases are important input for the identification of classes and their methods. There is, however, no established, empirically validated technique for the transition from use cases to class diagrams. One recommended technique is to derive classes by analyzing the use cases. It has, nevertheless, been reported that this technique leads to problems, such as the developers missing requirements and mistaking requirements for design. An alternative technique is to identify classes from a textual requirements specification and subsequently apply the use case model to validate the resulting class diagram. This paper describes two controlled experiments conducted to investigate these two approaches to applying use case models in an object-oriented design process. The first experiment was conducted with 53 students as subjects. Half of the subjects used a professional modelling tool; the other half used pen and paper. The second experiment was conducted with 22 professional software developers as subjects, all of whom used one of several modelling tools. The first experiment showed that applying use cases to validate class diagrams constructed from textual requirements led to more complete class diagrams than did the derivation of classes from a use case model. In the second experiment, however, we found no such difference between the two techniques. In both experiments, deriving class diagrams from the use cases led to a better structure of the class diagrams. The results of the experiments therefore show that the technique chosen for the transition from use cases to class diagrams affects the quality of the class diagrams, but also that the effects of the techniques depend on the categories of developer applying it and on the tool with which the technique is applied.

Journal ArticleDOI
TL;DR: It is demonstrated that the collection of feedback during experiments provides useful additional data to validate the data obtained from other sources about solution times and quality of solutions and to understand subjects’ perception of experiments.
Abstract: Objective: To improve the qualitative data obtained from software engineering experiments by gathering feedback during experiments. Rationale: Existing techniques for collecting quantitative and qualitative data from software engineering experiments do not provide sufficient information to validate or explain all our results. Therefore, we would like a cost-effective and unobtrusive method of collecting feedback from subjects during an experiment to augment other sources of data. Design of study: We formulated a set of qualitative questions that might be answered by collecting feedback during software engineering experiments. We then developed a tool to collect such feedback from experimental subjects. This feedback-collection tool was used in four different experiments and we evaluated the usefulness of the feedback obtained in the context of each experiment. The feedback data was triangulated with other sources of quantitative and qualitative data collected for the experiments. Results: We have demonstrated that the collection of feedback during experiments provides useful additional data to: validate the data obtained from other sources about solution times and quality of solutions; check process conformance; understand problem solving processes; identify problems with experiments; and understand subjects? perception of experiments. Conclusions: Feedback collection has proved useful in four experiments and we intend to use the feedback-collection tool in a range of other experiments to further explore the cost-effectiveness and limitations of this technique. It is also necessary to carry out a systematic study to more fully understand the impact of the feedback-collecting tool on subjects? performance in experiments.

Journal ArticleDOI
TL;DR: An innovative method is presented that circumvents the complexities, computational overhead, and difficulties involved in calibrating pure or direct three-group classification models and yielded promising results of a large-scale industrial software system.
Abstract: The primary aim of risk-based software quality classification models is to detect, prior to testing or operations, components that are most-likely to be of high-risk. Their practical usage as quality assurance tools is gauged by the prediction-accuracy and cost-effective aspects of the models. Classifying modules into two risk groups is the more commonly practiced trend. Such models assume that all modules predicted as high-risk will be subjected to quality improvements. Due to the always-limited reliability improvement resources and the variability of the quality risk-factor, a more focused classification model may be desired to achieve cost-effective software quality assurance goals. In such cases, calibrating a three-group (high-risk, medium-risk, and low-risk) classification model is more rewarding. We present an innovative method that circumvents the complexities, computational overhead, and difficulties involved in calibrating pure or direct three-group classification models. With the application of the proposed method, practitioners can utilize an existing two-group classification algorithm thrice in order to yield the three risk-based classes. An empirical approach is taken to investigate the effectiveness and validity of the proposed technique. Some commonly used classification techniques are studied to demonstrate the proposed methodology. They include, the C4.5 decision tree algorithm, discriminant analysis, and case-based reasoning. For the first two, we compare the three-group model calibrated using the respective techniques with the one built by applying the proposed method. Any two-group classification technique can be employed by the proposed method, including those that do not provide a direct three-group classification model, e.x., logistic regression and certain binary classification trees, such as CART. Based on a case study of a large-scale industrial software system, it is observed that the proposed method yielded promising results. For a given classification technique, the expected cost of misclassification of the proposed three-group models were significantly better (generally) when compared to the technique?s direct three-group model. In addition, the proposed method is also evaluated against an alternate indirect three-group classification method.

Journal ArticleDOI
TL;DR: This paper shows how it was able to produce more general statements regarding the tensions and their amelioration, and then introduces results from a fifth company, which are generally supported by results from this fifth company.
Abstract: This paper reports on an experience of using metaphor in qualitative research of software engineering in practice. Our project aimed to uncover non-technical factors affecting the adoption and evolution of Software Quality Management Systems (referred to here as `the quality process'). Previously we have reported the tensions we uncovered around the quality process in four companies, based on semi-structured interviews. This paper extends this work by applying metaphor to the results. We show how we were able to produce more general statements regarding the tensions and their amelioration, and then introduce results from a fifth company, which we compare against our general statements. We find that these statements are generally supported by results from this fifth company. Finally we present some reflections on our experience of using metaphor in this way.

Journal ArticleDOI
TL;DR: A qualitative examination of the project plans produced by all teams indicates that the primary reasons that teams with less experience of either type produce lower cost estimates are that they have failed to include some tasks that are included by more experienced teams, and they have estimated shorter task durations than have the more experienced Teams.
Abstract: Data from 135 teams that have participated in a software project planning exercise are analyzed to determine the relationship between team experience and each team?s estimate of total project cost. The analysis shows that cost estimates are dependent upon two kinds of team experience: (1) the average experience for the members of each team and (2) whether or not any members of the team have similar project experience. It is shown that if no members of a planning team have had similar project experience then the estimate of cost is correlated with average team experience, with teams having greater average team experience producing higher total cost estimates. If at least one member of the planning team has had similar project experience then there is a weaker relationship between average team experience and cost, and cost estimates produced by those teams with similar project experience are close to those produced by teams with the greatest average team experience. A qualitative examination of the project plans produced by all teams indicates that the primary reasons that teams with less experience of either type produce lower cost estimates are that they have failed to include some tasks that are included by more experienced teams, and that they have estimated shorter task durations than have the more experienced teams.

Journal ArticleDOI
TL;DR: This work investigates evolvability at the analysis level, i.e. at the level of the conceptual models that are built of information systems, and indicates that, for some types of change, abstract models are better evolvable than concrete ones.
Abstract: In today's dynamic environments, evolvability of information systems is an increasingly important characteristic. We investigate evolvability at the analysis level, i.e. at the level of the conceptual models that are built of information systems (e.g. UML models). More specifically, we focus on the influence of the level of abstraction of the conceptual model on the evolvability of the model. Abstraction or genericity is a fundamental principle in several research areas such as reuse, patterns, software architectures and application frameworks. The literature contains numerous but vague claims that software based on abstract conceptual models has evolvability advantages. Hypotheses were tested with regard to whether the level of abstraction influences the time needed to apply a change, the correctness of the change and the structure degradation incurred. Two controlled experiments were conducted with 136 subjects. Correctness and structure degradation were rated by human experts. Results indicate that, for some types of change, abstract models are better evolvable than concrete ones. Our results provide insight into how the rather vague claims in literature should be interpreted.

Journal ArticleDOI
TL;DR: Based on these findings, it is recommended that vendors should provide CASE tools with adaptable methodology support, which allow their users to fit automated consistency assurance to the task at hand.
Abstract: This paper reports the results of a controlled experiment undertaken to investigate whether the methodology support offered by a CASE tool does have an impact on the tool's acceptance and actual use by individuals.Subjects used the process modelling tool SPEARMINT to complete a partial process model and remove all inconsistencies. Half the subjects used a variant of SPEARMINT that corrected consistency violations automatically and silently, whilst the other half used a variant of SPEARMINT that told them about inconsistencies both immediately and persistently but without automatic correction. Measurement of acceptance and prediction of actual use was based on the technology acceptance model, supplemented by beliefs about consistency rules. The impact of form of automated consistency assurance applied or hierarchical consistency rules was found to be significant at the 0.05 level with a type I error of 0.027, explaining 71.6% of the variance in CASE tool acceptance. However, intention to use and thus predicted use was of the same size for both variants of SPEARMINT, whereas perceived usefulness and perceived ease of use were affected contrarily.Internal validity of the findings was threatened by validity and reliability issues related to beliefs about consistency rules. Here, further research is needed to develop valid constructs and reliable scales. Following the experiment, a small survey among experienced users of SPEARMINT found that different forms of automated consistency assurance were preferred depending on individual, consistency rule, and task characteristics. Based on these findings, it is recommended that vendors should provide CASE tools with adaptable methodology support, which allow their users to fit automated consistency assurance to the task at hand.