scispace - formally typeset
Search or ask a question

Showing papers in "Empirical Software Engineering in 2000"


Journal ArticleDOI
TL;DR: It is found that the differences are only minor, and it is concluded that software engineering students may be used instead of professional software developers under certain conditions.
Abstract: In many studies in software engineering students are used instead of professional software developers, although the objective is to draw conclusions valid for professional software developers. This paper presents a study where the difference between the two groups is evaluated. People from the two groups have individually carried out a non-trivial software engineering judgement task involving the assessment of how ten different factors affect the lead-time of software development projects. It is found that the differences are only minor, and it is concluded that software engineering students may be used instead of professional software developers under certain conditions. These conditions are identified and described based on generally accepted criteria for validity evaluation of empirical studies.

674 citations


Journal ArticleDOI
TL;DR: This paper presents a statistical simulation tool, namely the bootstrap method, which helps the user in tuning the analogy approach before application to real projects, and shows how measures of accuracy and in particular, confidence intervals, may be computed for the analogy-based estimates, using thebootstrap method with different assumptions about the population distribution of the data set.
Abstract: Estimation of a software project effort, based on project analogies, is a promising method in the area of software cost estimation. Projects in a historical database, that are analogous (similar) to the project under examination, are detected, and their effort data are used to produce estimates. As in all software cost estimation approaches, important decisions must be made regarding certain parameters, in order to calibrate with local data and obtain reliable estimates. In this paper, we present a statistical simulation tool, namely the bootstrap method, which helps the user in tuning the analogy approach before application to real projects. This is an essential step of the method, because if inappropriate values for the parameters are selected in the first place, the estimate will be inevitably wrong. Additionally, we show how measures of accuracy and in particular, confidence intervals, may be computed for the analogy-based estimates, using the bootstrap method with different assumptions about the population distribution of the data set. Estimate confidence intervals are necessary in order to assess point estimate accuracy and assist risk analysis and project planning. Examples of bootstrap confidence intervals and a comparison with regression models are presented on well-known cost data sets.

239 citations


Journal ArticleDOI
Tore Dybå1
TL;DR: The main result is an instrument for measuring the key factors of success in SPI based on data collected from 120 software organizations and the measures were found to have satisfactory psychometric properties.
Abstract: Understanding how to implement SPI successfully is arguably the most challenging issue facing the SPI field today. The SPI literature contains many case studies of successful companies and descriptions of their SPI programs. However, there has been no systematic attempt to synthesize and organize the prescriptions offered. The research efforts to date are limited and inconclusive and without adequate theoretical and psychometric justification. This paper provides a synthesis of prescriptions for successful quality management and process improvement found from an extensive review of the quality management, organizational learning, and software process improvement literature. The literature review was confirmed by empirical studies among both researchers and practitioners. The main result is an instrument for measuring the key factors of success in SPI based on data collected from 120 software organizations. The measures were found to have satisfactory psychometric properties. Hence, managers can use the instrument to guide SPI activities in their respective organizations and researchers can use it to build models to relate the facilitating factors to both learning processes and SPI outcomes.

201 citations


Journal ArticleDOI
TL;DR: These hints are meant to help with judging empirical work and reduce some of the angst associated with accepting empirical papers.
Abstract: Papers about empirical work in software engineering are still somewhat of a novelty and reviewers, especially those inexperienced with empirical work themselves, are often unsure whether a paper is good enough for publication. Conservative reviewers tend to err on the side of rejection, i.e., may sometimes reject a paper that, though not perfect, would advance our understanding or the empirical methods used. These hints are meant to help with judging empirical work and reduce some of the angst associated with accepting empirical papers.

200 citations


Journal ArticleDOI
TL;DR: The results suggest that a combination of multiple perspectives may not give higher coverage of the defects compared to single-perspective reading, but further studies are needed to increase the understanding of perspective difference.
Abstract: Perspective-Based Reading (PBR) is a scenario-based inspection technique where several reviewers read a document from different perspectives (e.g. user, designer, tester). The reading is made according to a special scenario, specific for each perspective. The basic assumption behind PBR is that the perspectives find different defects and a combination of several perspectives detects more defects compared to the same amount of reading with a single perspective. This paper presents a study which analyses the differences in perspectives. The study is a partial replication of previous studies. It is conducted in an academic environment using graduate students as subjects. Each perspective applies a specific modelling technique: use case modelling for the user perspective, equivalence partitioning for the tester perspective and structured analysis for the design perspective. A total of 30 subjects were divided into 3 groups, giving 10 subjects per perspective. The analysis results show that (1) there is no significant difference among the three perspectives in terms of defect detection rate and number of defects found per hour, (2) there is no significant difference in the defect coverage of the three perspectives, and (3) a simulation study shows that 30 subjects is enough to detect relatively small perspective differences with the chosen statistical test. The results suggest that a combination of multiple perspectives may not give higher coverage of the defects compared to single-perspective reading, but further studies are needed to increase the understanding of perspective difference.

121 citations


Journal ArticleDOI
TL;DR: A practical classification rule is presented in the context of classification tree models that allows appropriate emphasis on each type of misclassification according to the needs of the project, especially important when there are few faulty modules.
Abstract: Software product and process metrics can be useful predictors of which modules are likely to have faults during operations. Developers and managers can use such predictions by software quality models to focus enhancement efforts before release. However, in practice, software quality modeling methods in the literature may not produce a useful balance between the two kinds of misclassification rates, especially when there are few faulty modules. This paper presents a practical classification rule in the context of classification tree models that allows appropriate emphasis on each type of misclassification according to the needs of the project. This is especially important when the faulty modules are rare. An industrial case study using classification trees, illustrates the tradeoffs. The trees were built using the TREEDISC algorithm which is a refinement of the CHAID algorithm. We examined two releases of a very large telecommunications system, and built models suited to two points in the development life cycle: the end of coding and the end of beta testing. Both trees had only five significant predictors, out of 28 and 42 candidates, respectively. We interpreted the structure of the classification trees, and we found the models had useful accuracy.

63 citations


Journal ArticleDOI
TL;DR: It is argued that these indicators are statistics that describe properties of the estimation errors or residuals and that the sensible choice of indicator is largely governed by the goals of the estimator.
Abstract: Building and evaluating prediction systems is an important activity for software engineering researchers. Increasing numbers of techniques and datasets are now being made available. Unfortunately systematic comparison is hindered by the use of different accuracy indicators and evaluation processes. We argue that these indicators are statistics that describe properties of the estimation errors or residuals and that the sensible choice of indicator is largely governed by the goals of the estimator. For this reason it may be helpful for researchers to provide a range of indicators. We also argue that it is useful to formally test for significant differences between competing prediction systems and note that where only a few cases are available this can be problematic, in other words the research instrument may have insufficient power. We demonstrate that this is the case for a well known empirical study of cost models. Simulation, however, could be one means of overcoming this difficulty.

60 citations


Journal ArticleDOI
TL;DR: The replication of a CREWS project experiment suggests CREWS use-case authoring guidelines improve the completeness of use- case descriptions, but the results show that the CREWS guidelines do not necessarily improve the use-casedescriptions.
Abstract: Use cases have become an important tool in software engineering There has been much focus on the diagram notation but relatively little on use-case descriptions As part of a welcome and important research project into the use of scenarios in requirements engineering, the CREWS (Co-operative Requirements Engineering With Scenarios, an EU funded ESPRIT project 21903) team has proposed a set of guidelines for writing use-case descriptions This paper describes the replication of a CREWS project experiment that suggests CREWS use-case authoring guidelines improve the completeness of use-case descriptions Our results show that the CREWS guidelines do not necessarily improve the use-case descriptions, only that the subjects implemented varying numbers of guidelines in their use-case descriptions Subjects in the control group implemented a significant percentage of the guidelines by `chance' To further justify our results, we also apply a different marking scheme to compare with the CREWS approach The results from the alternative marking approach show that there was no significant difference between the qualities of the use-case descriptions across the various groups

54 citations


Journal ArticleDOI
Anders Wesslén1
TL;DR: The results from this replication confirm the results in the original study: Size estimation accuracy gets better, the defect density gets lower, the defects are found earlier and that the pre-compile yield gets better during the PSP course.
Abstract: The Personal Software Process (PSP) has during the last couple of years gained attention as a way to individual improvements in software development. The PSP is introduced to students and engineers through a course, which introduces a personal software development process. The personal software development process is improved in steps during the course and a collection of methods is introduced to support the personal development process. The question is, however, how do these methods influence the performance of an individual engineer? This question has been studied in a study made at the Software Engineering Institute, and the study has shown that the methods in the PSP have a positive effect on the performance of the individuals. There is however a need to replicate this study to confirm the findings in other settings and with other individuals. This paper describes a replication of the study made at the Software Engineering Institute. Both the original study and this replication are made on data reported from the students taking the PSP course. The differences between the two studies are the programming languages used, which held the courses, the class sizes, and the experiences of the students. In summary, the results from this replication confirm the results in the original study: Size estimation accuracy gets better, the defect density gets lower, the defects are found earlier and that the pre-compile yield gets better during the PSP course. Basically, the two studies show that the methods in the PSP help engineers to improve their performance.

36 citations


Journal ArticleDOI
TL;DR: Analysis of data suggests that collecting observation of software measurement programs with the instrument will lead to more complete knowledge of program success factors that will provide assistance to practitioners in an area that has proved notoriously difficult.
Abstract: This paper reports on the development and validation of an instrument for the collection of empirical data on the establishment and conduct of software measurement programs. The instrument is distinguished by a novel emphasis on defining the context in which a software measurement program operates. This emphasis is perceived to be the key to 1) generating knowledge about measurement programs that can be generalised to various contexts, and, 2) supporting a contingency approach to the conduct of measurement programs. A pilot study of thirteen measurement programs was carried out to trial the instrument. Analysis of this data suggests that collecting observations of software measurement programs with the instrument will lead to more complete knowledge of program success factors that will provide assistance to practitioners in an area that has proved notoriously difficult.

33 citations


Journal ArticleDOI
TL;DR: The ability to predict the impact of changes in requirements and the cost of related activities is shown and the approach is based on enhanced traceability and an integrated view of the process and product models.
Abstract: We present a case study that aims at quantitative assessment of the impact of requirements changes, and quantitative estimation of costs of the development activities that must be carried out to accomplish those changes. Our approach is based on enhanced traceability and an integrated view of the process and product models. The elements in the process and product models are quantitatively characterised through proper measurement, thus achieving a sound basis for different kinds of sophisticated analysis concerning the impact of requirements changes and their costs. We present the results of the application of modeling and measurement to an industrial project dealing with real-time software development. The ability to predict the impact of changes in requirements and the cost of related activities is shown.

Journal ArticleDOI
TL;DR: Whether procedural roles (moderator, reader, recorder) affect group performance, particularly in terms of process loss, and how procedural roles may greater impact group performance are studied.
Abstract: Software inspections are important for finding defects in software products (Fagan, 1976; Gilb, 1993; Humphrey, 1995; Strauss and Ebenau, 1994). A typical inspection includes two stages: individual preparation followed by a group review with roles assigned to each reviewer. Research has shown that group tasks typically result in process loss (Lorge et al., 1958; Steiner, 1972). In software defect detection also, considerable defects found during individual preparation are subsequently not reported by the group (Porter and Votta, 1994; Porter et al., 1995, 1997; Land et al., 1997a, 1997b; Siy, 1996; Votta, 1993). Our objective is to study whether procedural roles (moderator, reader, recorder) affect group performance, particularly in terms of process loss. At the same time, the use of roles in software reviews has also not been empirically validated, although there are wide claims for their benefits. Procedural roles made a limited difference to group performance. Further analyses provide possible explanations for the results and a deeper understanding of how groups make their decisions based on individual reviewers‘ findings. Limitations of the research are discussed. We also suggest how procedural roles may greater impact group performance.

Journal ArticleDOI
Alessandro Maccari1, Claudio Riva1
TL;DR: It emerged that CASE tools support is reputed mostuseful for the following functions: graphical drawing, automatic documentation generation and storage of diagrams, and a mismatch between the features required by the developers and those offered by CASE products.
Abstract: We present the results of a research work targeted to understanding the domains and consequences of CASE tools usage in Nokia. We aim to evaluate the importance of the various CASE tools features, as rated by our developers, and how well such features are implemented in currently available CASE tools. A structured questionnaire was sent to our most experienced developers and CASE users. From this survey, it emerged that CASE tools support is reputed most useful for the following functions: graphical drawing, automatic documentation generation and storage of diagrams. The results hint to a mismatch between the features required by the developers and those offered by CASE products. Further research is needed before more definite conclusions can be drawn.

Journal ArticleDOI
TL;DR: It is found that people can choose the relevant frames with a reasonable degree of accuracy, but that this is improved where they work to provide a collectives solution, and that experience appears to improve the accuracy with which groups can collectively choose relevant frames.
Abstract: Problem frames are a relatively new approach to requirements engineering, promising benefits not only in elicitation but also in subsequent design, by allowing their users to select methods and techniques appropriate to their given problem domain. In order to be effective this approach relies upon the correct identification of relevant problem frames for a given problem or scenario. Hence, we examine whether people are able to identify the correct (relevant) frames for a given set of problem descriptions, and whether they can correctly gauge the relative contribution of each identified frame to the given problem. We note the Euclidean distance of (individual and group) answers from an expert solution, considering each problem frame as a separate dimension. Examination of this distance (or magnitude of error) allows us to gauge the accuracy with which people can assign problem frames. We compare the performance of individuals within groups, and the performance where groups work together to provide a collective solution, comparing both of these with a fair-distribution strategy. We found that people can choose the relevant frames with a reasonable degree of accuracy, but that this is improved where they work to provide a collective solution. We also note differences among groups, for example, that experience appears to improve the accuracy with which groups can collectively choose relevant frames.

Journal ArticleDOI
TL;DR: Capabilities of production models are shown and how production models can be combined with other approaches to allow for assessing and hence understanding software project data is illustrated.
Abstract: One of the goals of collecting project data during software development and evolution is to assess how well the project did and what should be done to improve in the future. With the wide range of data often collected and the many complicated relationships between them, this is not always easy. This paper suggests to use production models (Data Envelope Analysis) to analyze objective variables and their impact on efficiency. To understand the effect of subjective variables, it is suggested to apply principal component analysis (PCA). Further, we propose to combine the results from the production models and the analysis of the subjective variables. We show capabilities of production models and illustrate how production models can be combined with other approaches to allow for assessing and hence understanding software project data. The approach is illustrated on a data set consisting of 46 software projects from the NASA-SEL database (NASA-SEL, 1992). The data analyzed is of the type that is commonly found in project databases.

Journal ArticleDOI
TL;DR: It is concluded that studies comparing extreme programming approaches with conventional CASE tool approaches are needed to help determine if the struggle to understand the constraint environment at a high level of abstraction is worthwhile or not, and that every subject-based experiment should consider and understand the performance of individuals.
Abstract: This paper reports the results of an experiment undertaken for the CADPRO (Constraints And the Decision PROject) project. Subjects with varied experience produced data flow diagrams (DFDs) using a DFD tool generated by CASEMaker, a meta-CASE tool. Half the subjects received routine notice of instances of internal (as opposed to hierarchical) methodological constraint violations via an unobtrusive window whilst the other half did not. The DFD tool automatically recorded subjects' delivery and constraint profiles. Video records, observer notes, and subject debriefings were also used to yield other performance data. While evidence was found in support of the research model underpinning the CADPRO project, the model needs to be revised to take into account the affects of human-computer interface constraints and the different speeds with which people work. We learnt an important lesson about subject randomisation, which is not to assume that all subjects can be treated alike if they share the minimum necessary experience thought required of the problem. We believe it is important for every subject-based experiment to consider and understand the performance of individuals. Because of the complexity of constraint environments in CASE tools we also conclude that studies comparing extreme programming approaches with conventional CASE tool approaches are needed to help determine if the struggle to understand the constraint environment at a high level of abstraction is worthwhile or not. Further experiments, possibly replication variants of this one, are needed to help validate our interpretations.

Journal ArticleDOI
TL;DR: Two intensive, longitudinal case studies were conducted at IBM Hursley Park to investigate the effects of various aspects of process on a project’s schedule and subsequent duration and generated three models and numerous insights into software project behaviour.
Abstract: Two intensive, longitudinal case studies were conducted at IBM Hursley Park to investigate the effects of various aspects of process on a project’s schedule and subsequent duration. The case studies examined the actual behaviour of the two projects in depth; developed conceptual structures relating the lower-level processes of each project to the higher-level processes; related the lower-level and higher-level processes to project duration; and tested Bradac et al.’s (1994) conjecture that waiting is more prevalent during the end of a project than during the middle of a project. A large volume of qualitative and quantitative evidence was collected and analysed for the two projects (Project B and Project C). This evidence includes minutes of status meetings, interviews, project schedules, and information from feedback workshops (which were conducted several months after the completion of the projects). The analyses generated three models and numerous insights into software project behaviour. The three models are: † A model of software project schedule behaviour. † A model of capability. † An integrated model of schedule behaviour and capability. The insights concerned: † Characteristics of a project i.e. the actual progress of phases and milestones, the amount of work on a project, the capability of a project, tactics of management, and sociotechnical aspects of a project

Journal ArticleDOI
TL;DR: This paper verifies and corrects the Halstead Length Estimator skewness for a large collection of `C' programs of varying sizes.
Abstract: Numerous studies have confirmed the skewness of Halstead's Software Science Length Estimator (Beser, 1983; Gonzales, 1990). The Length estimator consistently underestimates the size of `small' programs (program size 4000 tokens). This paper verifies and corrects the Halstead Length Estimator skewness for a large collection of `C' programs of varying sizes.

Journal ArticleDOI
TL;DR: About three years ago the National Research Council Canada established a Research Ethics Board (REB) 1 to review all research proposals involving human participation, and this became the starting point of a two-year journey to learn about ethical research practices and their application to empirical software engineering.
Abstract: About three years ago the National Research Council Canada established a Research Ethics Board (REB) 1 to review all research proposals involving human participation. At the time, my university collaborator and I were gathering requirements for a software engineering exploration tool. Human participation in our research had three components 1) web based questionnaires; 2) interviews; and 3) observation sessions. The subjects/participants in our study were sampled from a relatively large group (about 18 software engineers) for whom the tool was being developed. The engineers' employer provided financial support for the research. When the REB was established, it became necessary for me to get their approval of my research protocol. Given that we were using fairly typical IS and empirical software engineering methodolo-gies, I did not anticipate any problems with my proposal. Nonetheless, REB members had concerns regarding potential ethical problems. I was unable to quell their concerns because I could not point to an accepted standard for ethical research practices within our field. This became, for me, the starting point of a two-year journey to learn about ethical research practices and their application to empirical software engineering. Now, I understand the REBs concerns, although I am not sure they are fully warranted. Below I detail the REBs five most pressing issues. The first ethical problem had to do with the research approval process. Before interviewing individual employees, the company asked that I check with their manager to make sure that I was not bothering employees during a critical time. This is a reasonable enough request from the employer's perspective because sometimes the employees could not be disturbed. However, from an ethical perspective, this allowed the manager to know who I would be interviewing and who I would not be interviewing. This compromised the anonymity of the research subjects, and opened the gate for them to feel that they were being coerced into participating. The reasoning is that their manager knew I was approaching them, they knew their manager knew, so therefore they felt obligated to participate. The second ethical problem centered on whether the company should be named in the research report. My particular research did not reveal any sensitive information, but related research did. When the related research was published, the company was thanked by name in the acknowledgment section. While the company did agree to this, the report could conceivably harm the company. That is, the company could …


Journal ArticleDOI
TL;DR: Jankowski (1997) performed a study of methodological support in which each of two CASE tools were used by eight project teams (students) over a two month period to construct a specification of a hotel information system, and found that internal consistency rules were easily adhered to regardless of the level of methodologicalSupport provided.
Abstract: Jankowski (1997) performed a study of methodological support in which each of two CASE tools (Visible Systems Corporation’s Visible Analyst Workbench (VAW) Version 3.1 and Excelerator Version 1.9 from Intersolv) were used by eight project teams (students) over a two month period to construct a specification of a hotel information system. Jankowski’s research hypothesis was that for a given methodology rule (constraint), the number of violations encountered would increase as the restrictiveness by which the rules (constraints) is implemented decreases. Jankowski found that internal consistency rules (applied to one diagram at a time) were easily adhered to regardless of the level of methodological support provided and that hierarchical consistency rules were adhered to more frequently in the presence of rigorous methodological support. Jankowski, however, did not address other differences between the human-computer interfaces of the two CASE tools used in his study. This correspondence raises the possibility that these other differences may have been a significant contributory factor and can explain a lot about his data. Our own work has revealed to us how important human-computer interface constraints are in shaping end-user behaviour of CASE tools (Brooks et al., 1998; Takada et al., 1998). One of us also personally recalls poor end-user experiences using the Excelerator CASE tool (Version 1.9) as a student in the early 1990s, such that attempts to maintain the consistency of the underlying repository were abandoned.