scispace - formally typeset
Open AccessJournal ArticleDOI

Preliminary guidelines for empirical research in software engineering

TLDR
A preliminary set of research guidelines aimed at stimulating discussion among software researchers, intended to assist researchers, reviewers, and meta-analysts in designing, conducting, and evaluating empirical studies.
Abstract
Empirical software engineering research needs research guidelines to improve the research and reporting processes. We propose a preliminary set of research guidelines aimed at stimulating discussion among software researchers. They are based on a review of research guidelines developed for medical researchers and on our own experience in doing and reviewing software engineering research. The guidelines are intended to assist researchers, reviewers, and meta-analysts in designing, conducting, and evaluating empirical studies. Editorial boards of software engineering journals may wish to use our recommendations as a basis for developing guidelines for reviewers and for framing policies for dealing with the design, data collection, and analysis and reporting of empirical studies.

read more

Content maybe subject to copyright    Report

https://doi.org/10.4224/8914084
READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE.
https://nrc-publications.canada.ca/eng/copyright
Vous avez des questions?
Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la
première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n’arrivez
pas à les repérer, communiquez avec nous à PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca.
Questions? Contact the NRC Publications Archive team at
PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca. If you wish to email the authors directly, please see the
first page of the publication for their contact information.
NRC Publications Archive
Archives des publications du CNRC
For the publisher’s version, please access the DOI link below./ Pour consulter la version de l’éditeur, utilisez le lien
DOI ci-dessous.
Access and use of this website and the material on it are subject to the Terms and Conditions set forth at
Preliminary Guidelines for Empirical Research in Software Engineering
Kitchenham, B.A.; Pfleeger, S.L.; Pickard, L.M.; Jones, P.W.; Hoaglin, D.C.;
El-Emam, Khaled; Rosenberg, J.
https://publications-cnrc.canada.ca/fra/droits
L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site
LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.
NRC Publications Record / Notice d'Archives des publications de CNRC:
https://nrc-publications.canada.ca/eng/view/object/?id=0c0d4174-6677-4d7f-8ff6-86927aa3aabc
https://publications-cnrc.canada.ca/fra/voir/objet/?id=0c0d4174-6677-4d7f-8ff6-86927aa3aabc

National Research
Council Canada
Institute for
Information Technology
Conseil national
de recherches Canada
Institut de technologie
de l'information
Preliminary Guidelines for Empirical Research in
Software Engineering *
Kitchenham, B.A., Pfleeger, S.L., Pickard, L.M., Jones, P.W.,
Hoaglin, D.C., El-Emam, K., Rosenberg, J.
January 2001
* published as NRC/ERB-1082. January 2001. 27 pages. NRC 44158.
Copyright 2001 by
National Research Council of Canada
Permission is granted to quote short excerpts and to reproduce figures and tables from this report,
provided that the source of such material is fully acknowledged.

National Research
Council Canada
Institute for
Information Technology
Conseil national
de recherches Canada
Institut de Technologie
de linformation
Preliminary Guidelines for
Empirical Research in
Software Engineering
Kitchenham, B.A., Pfleeger, S.L., Pickard, L.M.,
Jones, P.W., Hoaglin, D.C., El-Emam, K., and Rosenberg, J.
January 2001
ERB-1082
NRC 44158

* Keele University, Keele, Staffordshire, UK
** Systems/Software, Inc., Washington, DC, USA
*** Abt Associates Inc., Cambridge, MA, USA
**** National Reserch Council of Canada, Ottawa, Ontario, Canada
*****Sun Microsystems, Palo Alto, CA, USA
Preliminary guidelines for empirical research in software
engineering
Barbara A. Kitchenham*, Shari Lawrence Pfleeger**, Lesley M. Pickard*,
Peter W. Jones*, David C. Hoaglin***, Khaled El-Emam****,
and Jarrett Rosenberg****
Abstract
Empirical software engineering research needs research guidelines to improve the
research and reporting processes. We propose a preliminary set of research guidelines
aimed at stimulating discussion among software researchers. They are based on a review
of research guidelines developed for medical researchers and on our own experience in
doing and reviewing software engineering research. The guidelines are intended to assist
researchers, reviewers and meta-analysts in designing, conducting and evaluating
empirical studies. Editorial boards of software engineering journals may wish to use our
recommendations as a basis for developing guidelines for reviewers and for framing
policies for dealing with the design, data collection and analysis and reporting of
empirical studies.
Keywords: empirical software research; research guidelines; statistical mistakes.
1. Introduction
We have spent many years both undertaking empirical studies in software engineering
ourselves, and reviewing reports of empirical studies submitted to journals or presented
as postgraduate theses or dissertations. In our view, the standard of empirical software
engineering research is poor. This includes case studies, surveys and formal experiments,
whether observed in the field or in a laboratory or classroom. This statement is not a
criticism of software researchers in particular; many applied disciplines have problems
performing empirical studies. For example, Yancey [50] found many articles in the
American Journal of Surgery (1987 and 1988) with methodologic errors so serious as to
render invalid the conclusions of the authors. McGuigan [31] reviewed 164 papers that
included numerical results that were published in the British Journal of Psychiatry in
1993 and found that 40% of them had statistical errors. When Welch and Gabbe [48]
reviewed clinical articles in six issues of the American Journal of Obstetrics, they found
more than half the studies impossible to assess because the statistical techniques used
were not reported in sufficient detail. Furthermore, nearly one third of the articles
contained inappropriate uses of statistics. If researchers have difficulty in a discipline
such as medicine, which has a rich history of empirical research, it is hardly surprising
that software engineering researchers have problems.
In a previous investigation of the use of meta-analysis in software engineering [34], three
of us identified the need to assess the quality of the individual studies included in a meta-
analysis. In this paper, we extend those ideas to discuss several guidelines that can be
used both to improve the quality of on-going and proposed empirical studies and to

2
encourage critical assessment of existing studies. We believe that adoption of such
guidelines will not only improve the quality of individual studies but will also increase
the likelihood that we can use meta-analysis to combine the results of related studies. The
guidelines presented in this paper are a first attempt to formulate a set of guidelines.
There needs to be a wider debate before the software engineering research community
can develop and agree on definitive guidelines.
Before we describe our guidelines, it may be helpful to you to understand who we are and
how we developed these guidelines. Kitchenham, Pickard, Pfleeger and El-Emam are
software engineering researchers with backgrounds in statistics as well as computer
science. We regularly review papers and dissertations, and we often participate in
empirical research. Rosenberg is a statistician who applies statistical methods to software
engineering problems. Jones is a medical statistician with experience in developing
standards for improving medical research studies. Hoaglin is a statistician who has long
been interested in software and computing. He reviewed eight papers published in
Transactions on Software Engineering in the last few years. These papers were not
chosen at random. Rather, they were selected (by those of us whose primary focus is
software engineering) because their authors are well-known for their empirical software
engineering work, and because their techniques are typical of papers submitted to this
journal. Hoaglin s independent comments on these papers confirmed our suspicions that
the current state of empirical studies as published in Transactions on Software
Engineering is similar to that found in medical studies. He found examples of poor
experimental design, inappropriate use of statistical techniques and conclusions that did
not follow from the reported results. We omit the titles of these papers. We want the
focus of our guidelines to be overall improvement of our discipline, not finger-pointing at
previous work. We do, however, cite papers that include specific statistical mistakes
when they help illustrate the reason that a particular guideline should be followed.
The main sources for this paper, apart from our own experience, are:
The Yancey paper already mentioned. Yancey identifies ten rules for reading clinical
research results. Many of the rules can also serve as guidelines for authors.
A paper by Sacks et al. [43] that considers quality criteria for meta-analyses of
randomized controlled trials. Sacks et al. point out that the quality of papers included
in a meta-analysis is important. In particular, they suggest considering the quality of
features such as the randomization process, the statistical analysis, and the handling
of withdrawals.
A paper on guidelines for contributors to journals by Altman [1].
The guidelines for statistical review of general papers and clinical trials prepared by
the British Medical Journal. (These guidelines are listed in Altman et al. [3], chapter
10 of Gardner and Altman [14], and on the journal s web page:
http://www.bmj.com/advice)
A book by Lang and Secic [28] with guidelines for reporting medical statistics.
The CONSORT statement on reporting the results of randomized trials in medicine
[4]. This statement has been adopted by seventy medical journals.

Citations
More filters
Journal ArticleDOI

Guidelines for conducting and reporting case study research in software engineering

TL;DR: This paper aims at providing an introduction to case study methodology and guidelines for researchers conducting case studies and readers studying reports of such studies, and presents recommended practices and evaluated checklists for researchers and readers of case study research.
Journal ArticleDOI

Empirical studies of agile software development: A systematic review

TL;DR: A systematic review of empirical studies of agile software development up to and including 2005 was conducted and provides a map of findings, according to topic, that can be compared for relevance to their own settings and situations.
Book

Experimentation in Software Engineering

TL;DR: The purpose of Experimentation in Software Engineering is to introduce students, teachers, researchers, and practitioners to empirical studies in software engineering, using controlled experiments, and provides indispensable information regarding empirical Studies in particular for experiments, but also for case studies, systematic literature reviews, and surveys.
Journal ArticleDOI

Clinical Trials: A Practical Approach

M. K. Palmer
Journal ArticleDOI

Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact

TL;DR: The infrastructure that is being designed and constructed to support controlled experimentation with testing and regression testing techniques is described and the impact that this infrastructure has had and can be expected to have.
References
More filters
Book

Statistical Power Analysis for the Behavioral Sciences

TL;DR: The concepts of power analysis are discussed in this paper, where Chi-square Tests for Goodness of Fit and Contingency Tables, t-Test for Means, and Sign Test are used.
Journal ArticleDOI

Coefficient alpha and the internal structure of tests.

TL;DR: In this paper, a general formula (α) of which a special case is the Kuder-Richardson coefficient of equivalence is shown to be the mean of all split-half coefficients resulting from different splittings of a test, therefore an estimate of the correlation between two random samples of items from a universe of items like those in the test.
Book

Applied Regression Analysis

TL;DR: In this article, the Straight Line Case is used to fit a straight line by least squares, and the Durbin-Watson Test is used for checking the straight line fit.
Book

Statistical Analysis with Missing Data

TL;DR: This work states that maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse and large-Sample Inference Based on Maximum Likelihood Estimates is likely to be high.
Book

Experimental and Quasi-Experimental Designs for Research

TL;DR: A survey drawn from social science research which deals with correlational, ex post facto, true experimental, and quasi-experimental designs and makes methodological recommendations is presented in this article.
Related Papers (5)