What is the common threat to the validity of a dataset?

Threats to Data ValidityName DescriptionSmall sample size A small sample threatens the validity of the dataset, since results may be: (a) prone to bias (data might come from a small community), (b) not statistically significant, and (c) not safe to generalize.

What can lead to the omission of important studies?

Missing non-English papersExploring studies written in a specific language can lead to the omission of important studies (or number of studies) written in other languages.

What is the literature review of the iottware project?

Software aradigms, assessment types and non-functional requirements in modelased integration testing: a systematic literature review.

What is the significance of the study selection criteria?

Study selection validityStudy selection validity is recognized as the major threat in secndary studies during the early phases of the research.

What is the way to determine the validity of the study?

The selection of classification schema is biased 0% 0% 0% 42.9% 57.1% 0% The interpretation of results is not objective 0% 0% 0% 14.3% 85.7% 0% Research Validity Lack of repeatability 0% 0% 0% 14.3% 85.7% 0% A not fitting research method has been selected 0% 0% 0% 14.3% 85.7% 0% Answering the RQs cannot fulfill the goal 0% 0% 0% 14.3% 85.7% 0% Lack of comparable studies 0% 0% 14.3% 14.3% 57.1% 14.3% Researchers are not familiar with the research field 0% 0% 0% 28.6% 57.1% 14.3% Lack of generalizability 0% 0% 0% 28.6% 71.4% 0%Fig. 5a. Mitigation Actions for Study Selection Threats to Validity.

Why did e validate the set of secondary studies?

To mitigate the risk of losing relevant studies e validated their set of secondary studies by cross-checking them against apers in other tertiary studies (serving as a gold standard).

(Open Access) Identifying, categorizing and mitigating threats to validity in software engineering secondary studies (2019) | Apostolos Ampatzoglou

Q: What are the contributions in "Identifying, categorizing and mitigating threats to validity in software engineering secondary studies" ?

In this paper, the authors review the corpus of secondary studies, with the aim to identify: ( a ) the trend of reporting threats to validity, ( b ) the most common threats to validity and corresponding mitigation actions, and ( c ) possible categories in which threats to validity can be classified. To achieve this goal the authors employ the tertiary study research method that is used for synthesizing knowledge from existing secondary studies. To alleviate this problem, the authors propose a classification schema for reporting threats to validity and possible mitigation actions. Conclusion: Based on the proposed schema, the authors provide a checklist, which authors of secondary studies can use for identifying and categorizing threats to validity and corresponding mitigation actions, while readers of secondary studies can use the checklist for assessing the validity of the reported results. Their results suggest that in recent years, secondary studies are more likely to report their threats to validity.

University of Groningen

Identifying, categorizing and mitigating threats to validity in software engineering secondary

studies

Ampatzoglou, Apostolos; Bibi, Stamatia; Avgeriou, Paris; Verbeek, Marijn; Chatzigeorgiou,

Alexander

Published in:

Information and Software Technology

DOI:

10.1016/j.infsof.2018.10.006

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date:

2019

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Ampatzoglou, A., Bibi, S., Avgeriou, P., Verbeek, M., & Chatzigeorgiou, A. (2019). Identifying, categorizing

and mitigating threats to validity in software engineering secondary studies.

Information and Software

Technology

106

, 201-230. https://doi.org/10.1016/j.infsof.2018.10.006

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the

author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

The publication may also be distributed here under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license.

More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverne-

amendment.

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately

and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the

number of authors shown on this cover page is limited to 10 maximum.

Information and Software Technology 106 (2019) 201–230

Contents lists available at ScienceDirect

Information and Software Technology

journal homepage: www.elsevier.com/locate/infsof

Identifying, categorizing and mitigating threats to validity in software

engineering secondary studies

Apostolos Ampatzoglou

∗

, Stamatia Bibi

, Paris Avgeriou

, Marijn Verbeek

Alexander Chatzigeorgiou

Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece

Department of Informatics and Telecommunications, University of Western Macedonia, Kozani, Greece

Department of Mathematics and Computer Science, University of Groningen, the Netherlands

          

Keywords:

Empirical software engineering

Secondary studies

Threats to Validity

Literature Review

       

Context: Secondary studies are vulnerable to threats to validity. Although, mitigating these threats is crucial

for the credibility of these studies, we currently lack a systematic approach to identify, categorize and mitigate

threats to validity for secondary studies.

Objective: In this paper, we review the corpus of secondary studies, with the aim to identify: (a) the trend of

reporting threats to validity, (b) the most common threats to validity and corresponding mitigation actions, and

Method: To achieve this goal we employ the tertiary study research method that is used for synthesizing knowl-

edge from existing secondary studies. In particular, we collected data from more than 100 studies, published until

December 2016 in top quality software engineering venues (both journals and conference).

Results: Our results suggest that in recent years, secondary studies are more likely to report their threats to

validity. However, the presentation of such threats is rather ad hoc, e.g., the same threat may be presented with

a dierent name, or under a dierent category. To alleviate this problem, we propose a classication schema for

reporting threats to validity and possible mitigation actions. Both the classication of threats and the associated

mitigation actions have been validated by an empirical study, i.e., Delphi rounds with experts.

Conclusion: Based on the proposed schema, we provide a checklist, which authors of secondary studies can use for

identifying and categorizing threats to validity and corresponding mitigation actions, while readers of secondary

studies can use the checklist for assessing the validity of the reported results.

Introduction

Empirical Software Engineering (ESE) research focuses on the ap-

plication of empirical methods on any phase of the software develop-

ment lifecycle. The three predominant types of empirical research are

[44,47] : (a) surveys, which are performed through questionnaires or in-

terviews on a sample in order to obtain characteristics of a population

[36] ; (b) case studies , which study phenomena in a “real-world ”context,

especially when the boundaries between phenomenon and context are

not clear [51] ; and (c) experiments , which have a limited scope and are

most often run in a laboratory setting, with a high level of control [47] .

During the last years and mainly due to the rise of the Evidence-Based

∗

Corresponding author.

E-mail address: apostolos.ampatzoglou@gmail.com (A. Ampatzoglou).

Software Engineering (EBSE) Paradigm

[22] , two other types of studies

have become quite popular [15] :

• Systematic Literature Reviews (SLRs) use data from previously

published studies for the purpose of research synthesis , which is the

collective term for a family of methods for summarizing, integrating

and, when possible, combining the ndings of dierent studies on

a topic or research question. Such synthesis can also identify cru-

cial areas and questions that have not been addressed adequately

with past empirical research. It is built upon the observation that

no matter how well-designed and executed, empirical ndings from

individual studies are limited in the extent to which they may be

generalized [18] .

EBSE is a movement in the software engineering research that aims to pro-

vide the means by which current best evidence from research can be integrated with

practical experience [22] .

https://doi.org/10.1016/j.infsof.2018.10.006

Received 20 February 2018; Received in revised form 4 October 2018; Accepted 6 October 2018

Available online 9 October 2018

A. Ampatzoglou et al. Information and Software Technology 106 (2019) 201–230

• Systematic Mapping Studies which use the same basic methodol-

ogy as SLRs but aim to identify and classify all research related to a

broad software engineering topic rather than answering questions

about the relative merits of competing technologies that conven-

tional SLRs address. They are intended to provide an overview of a

topic area and identify whether there are sub-topics with sucient

primary studies to conduct conventional SLRs and also to identify

sub-topics where more primary studies are needed [21] .

The strength of evidence produced by ESE research depends largely

on the use of systematic, rigorous guidelines on how to conduct, and re-

port empirical results (see e.g., for experiments [47] , for SLRs [18] , for

mapping studies [34] , for surveys [36] , and for case studies [38] ). One

of the most crucial parts of conducting an empirical study is the manage-

ment of threats to validity, i.e., possible aspects of the research design

that in some way compromise the credibility of results. Despite this cru-

cial role, we currently lack guidelines on how to identify, mitigate, and

categorize threats to validity in secondary studies; this is in contrast to

experiments, case studies and surveys, where mature guidelines exist.

Due to this reason, researchers either do not report threats to validity

for secondary studies, or report them in an ad hoc way (see Section 5 ).

Specically, the most common issues found in practice, concern threats

to validity being:

• Completely missing from certain studies. Thus, such studies do not

provide any mitigation actions for them;

• Incorrectly categorized . The same threat is classied in dierent

categories by dierent researchers (e.g., study selection bias is catego-

rized in some studies as threat to internal and in others as a threat to

conclusion validity . Also, in some cases threats are ineﬃciently cat-

egorized based on guidelines for other types of empirical research

(e.g., for experiments [45] , or for case studies [38] ), or under a cus-

tom categorization, which is not uniform . One possible reason for

this problem is the fact that threat categories are not orthogonal, es-

pecially in cases where they stem from dierent schools of thought

or guidelines (see Section 2.1 ). For example, reliability examines if

the results of a study depend highly on the involved researchers.

In turn, this relates to conclusion validity, in the sense that people

are prone to biases (e.g. due to previous experiences, preferences on

research, etc.);

• Inconsistently named . The same threat is reported with a dierent

name by dierent researchers (e.g., the terms publication bias and

researcher bias are used for describing the same threats);

• Inconsistently mitigated . The same threat is mitigated dierently

by dierent researchers. Although this provides a variety of available

mitigation actions, some mitigation actions are ineective and cause

confusion to readers who consider following them.

These issues, in turn lead to a diculty in evaluating the validity

of the reported results and hinder a uniform comparison between sec-

ondary studies. In addition, the lack of guidance for mitigating threats

to validity, which could serve as a reference point, makes it more di-

cult to reuse mitigation strategies, as well as to consistently identify and

categorize both threats and mitigation actions.

To address this problem, we conducted a tertiary study (i.e., an SLR

on secondary studies), so as to retrieve and analyze how software en-

gineering secondary studies identify, categorize and mitigate threats to

validity. The objective of this tertiary study is: “to summarize secondary

studies that report threats to validity , with the aim of identifying: (a) the

frequency of reporting threats to validity over the years, (b) the most

common threats to validity and (c) the corresponding mitigation actions ,

and (d) a possible classiﬁcation schema of threats to validity ”. The main

outcomes of the study are a classication schema for threats to validity

and a checklist that can be used while conducting/evaluating secondary

studies. The outcomes are expected to contribute towards establishing

a standard and consistent way of identifying, categorizing and miti-

gating threats to validity of secondary studies. In addition to that, in

order to enrich the outcomes of this work we explored existing litera-

ture in two related research sub-elds: (a) secondary studies in medi-

cal science (i.e., the area from where the Evidence-Based paradigm has

emerged from), and (b) guidelines for conducting secondary studies.

Related studies from medical science and the guidelines for perform-

ing secondary studies has led to the identication of best practices in

secondary studies that can be applied as mitigation actions for minimiz-

ing of eects of a validity threat, enriching the provided checklist that

has been derived from the classication schema. Finally, acknowledging

the subjectivity in the qualitative nature of this work, we validated the

outcomes through a Delphi method based on the opinion of experts in

secondary studies and empirical studies in general. The Delphi method

was iterated in three rounds and provided preliminary evidence for the

merits of the classication schema and checklist.

We note that literature reviews have been performed long before the

advent of the terms ‘Systematic Mapping Study’ and ‘Systematic Liter-

ature Review’ and corresponding guidelines. We also acknowledge that

secondary studies can be performed without following the guidelines

of SMSs and SLRs (especially before the two terms become popular).

However, such non-systematic literature reviews have not reported (in

the vast majority of the cases) threats to their conclusions. Reporting

of threats became popular once specic guidelines were proposed and

adopted in the context of the EBSE paradigm. Thus, for a study aiming

at systematically analyzing the reported threats, we consider it proper

to focus on the studies that have adopted the corresponding guidelines.

For the rest of the study, when we refer to secondary studies, we refer

to Systematic Mapping Studies and Systematic Literature Reviews.

The rest of the paper is organized as follows: Section 2 presents re-

lated work, i.e., categories of threats to validity in other empirical meth-

ods; Section 3 presents our tertiary study protocol; Section 4 reports on

the results; and Section 5 discusses the proposed guidelines for identify-

ing, categorizing and mitigating threats to validity for secondary studies

in software engineering. In Section 6 , we present the design and results

of our validation study, whereas in Sections 7 and 8 we present threats

to validity and conclude the paper.

Related work

The empirical software engineering literature points out the rele-

vance and importance of identifying and recording validity threats, as

an aspect of research quality [12,32] and [35] . According to Perry et al.

[32] the structure of an empirical study in SE should include a section

of threats to validity. This section should discuss the inuences that

may limit the authors’ and readers’ ability to interpret or draw conclu-

sions from the study’s data. In addition, Jedlitschka et al. [17] suggest

that each controlled experiment in SE should have a subsection named

“Limitation of the study ”where all threats that may have an impact on

the validity of results should be mentioned. Furthermore, Kitchenham

[22] has also underlined the importance of threats to validity, by high-

lighting that the implications of a validity threat should be addressed

and thoroughly discussed. Finally, Sjoberg et al. [42] emphasize the

scope of validity of the results of a SE study; the term ‘scope of va-

lidity’ is interpreted as the population of actors, technologies, activities,

software systems for which the results of a study are valid. The scope

of validity is considered to be crucial for producing general knowledge

synthesized by comparing and integrating results from dierent studies.

In this section we present related work, under three perspectives.

First, we present how threats to validity are categorized in the empiri-

cal software engineering eld (see Section 2.1 ). Second, in Section 2.2 ,

we present studies that are related to the identication and reporting

of threats to validity in medical science. This can provide valuable in-

put for our work, since medical research is considered a more mature

eld in secondary study design and execution and has already inspired

the guidelines for conducting secondary studies in software engineering.

Finally, in Section 2.3 , we present the most common guidelines for per-

202

A. Ampatzoglou et al. Information and Software Technology 106 (2019) 201–230

Table 1

Categories of Threats to Validity in ESE Research.

Conclusion validity : Originally called “statistical conclusion validity ”, this aspect deals with the degree to which conclusions reached (e.g. about relationships between factors) are

reasonable within the data collected. Researcher bias, for example, can greatly impact conclusions reached and can be considered to be a threat to conclusion validity. Similarly,

statistical analysis may lead to weak results that can be interpreted in dierent ways according to the bias of the researcher. In either case the researcher may reach the wrong

conclusion [47] .

Reliability : This aspect is concerned with to what extent the data and the analysis are dependent

on the specic researchers. Example of this type of threat is the unclear coding of

collected data. If a researcher produces certain results, then, other researchers should be able to reproduce identical results following the same methodology of the study [38] .

Internal validity : This aspect relates to the examination

of causal relations. Internal validity examines whether an experimental treatment/condition makes a dierence or not, and

whether there is evidence to support the claim [47] .

Construct validity : Denes how eectively a test or experiment measures up to its claims. This aspect deals with whether or not the researcher

measures what is intended to be

measured [47] .

External validity : The concern of this aspect is whether the results can be generalized. During the analysis of this validity, the researcher attempts to see if ndings of the study are

of relevance for others. In the case of quantitative research

(experiments), this primarily relies on the chosen sample size. In contrast, case studies have normally a low sample size, so

the researcher has to try and analyze to what extent the ndings can be related to other cases [47] .

forming secondary studies in the software engineering domain, as they

can also provide input for our work.

2.1. Threats to validity in empirical software engineering

Threats to validity have been often categorized in the literature of

general research methods in dierent types. Initially, Cook and Camp-

bell [8]

recorded four types of validity threats in quantitative experi-

mental analysis: statistical conclusion validity, internal validity, construct

validity of putative causes and eﬀects and external validity . Concerning

qualitative research, Maxwell [29] provided a general categorization of

threats that can be mapped to Cook and Campbell’s categorization as fol-

lows: theoretical validity (construct validity), generalizability (internal, ex-

ternal validity), and interpretive validity (statistical conclusion validity).

An additional threat category, mentioned by Maxwell [29] , is descriptive

validity, which is relevant only for qualitative studies. Descriptive valid-

ity reects the accuracy and objectivity of the information gathered. For

example, when researchers collect statements from participants, threats

to validity can be related to the way that researchers recorded or tran-

scribed the statements. Other types of validity threats that are found

in literature are: reliability [38,51] , transferability, credibility and con-

rmability [27] , uncontrollability, and contingency [ 14 ].

In the empirical SE community there are two main schools on re-

porting threats to validity: (a) Wohlin et al. [47] who adopted Cook

and Campbell’s [8] categorization of validity threats and presented four

main types of threats to validity for quantitative research within soft-

ware engineering: conclusion, internal, construct , and external valid-

ity; and (b) Runeson et al. [38] who discussed four main types of va-

lidity threats for case studies within software engineering: reliability,

internal, construct , and external validity. The threats of Runeson et al.

[38] are similar to those of Wohlin et al. [47] with the exception of

reliability replacing conclusion validity.

Bi et al. [4] argue that researchers should also consolidate actual

experimental research on a specic topic to complement existing generic

threats and guidelines when performing their research. The tradeo be-

tween internal and external validity has been addressed by Siegmund

et al. [40] , where the authors performed a survey and concluded that

externally valid papers are of greater practicality while internally valid

studies seem to be unrealistic. Additionally, the study examined the im-

pact of replication studies and found that although researchers realize

the necessity of such studies they are reluctant to conduct or review

them mainly due to the fact that there are no guidelines for performing

them [40] . A list of denitions of the union of the aforementioned cat-

egories of threats to validity (i.e. from [38] and [47] ) are presented in

Table 1 .

Petersen et al. [35] based on the categorizations of threats to validity

suggested by Maxwell, suggested a check list that can help researchers

Before publishing this paper (i.e., [8] ), Cook and Campbell had published

an online chapter focused on Conclusion and Internal validity threats.

identify the threats applicable to the type of research performed by re-

porting rst their world-view and then the research method applied. A

secondary study attempting to assess the practices in reporting validity

threats in ESE [12] concluded that more than 20% of the studied papers

contain no discussion of validity threats and the ones that do discuss

validity threats on average contain 5.44 threats.

Regarding threats to validity for secondary studies in software engi-

neering, we have been able to identify only one related work. In partic-

ular, Zhou et al. [53] have performed a tertiary study on more than 300

secondary studies until 2015. The authors have identied 23 threats to

validity for secondary studies, and organize the consequences of these

studies into four categories: internal, external, conclusion, and construct

validity. To alleviate these threats the authors maps the threats and pos-

sible consequences to 24 mitigation strategies. This paper shares com-

mon goals with our study, however, ours is broader in the sense that: (a)

it covers a wider timeframe (until 2017 instead of middle of 2015); (b)

it focuses only on top-quality venues, which are expected to pay special

attention in the proper application of methodological guidelines, such

as the proper reporting of threats to validity, a fact that increases the

quality of the obtained data; and most importantly (c) our study answers

two additional RQs, providing a classication schema and a checklist for

identifying, mitigating, and reporting threats to validity. In addition to

this, as indirect related work (especially in terms of mitigation actions),

in Section 2.3 we present a review of guidelines on secondary studies in

software engineering.

2.2. Threats to validity in medical science

In this section we report on quality assessment strategies for sys-

tematic reviews from medical science literature. While there is no clas-

sication of threats to validity for secondary studies or corresponding

mitigation actions in medical research, these quality assessment strate-

gies can provide useful input for deriving such outcomes in the software

engineering domain. Particularly we identify a number of quality assess-

ment criteria based on the guidelines, the checklists and protocols found

in medical research literature. These quality assessment criteria are sub-

sequently classied into ve categories, presented in Table 2 , based on

the aspect that they address: (a) primary study selection process, (b)

validity of primary studies (c) data reliability, (d) research design and

(e) reporting process. An additional factor that aects the quality of

secondary studies is the level of detail and completeness of reporting.

The criteria in Table 2 will be exploited after the development of the

proposed classication schema. In particular, we check if the criteria in

Table 2 are included in the list of mitigation actions; if not we incorpo-

rate them in the proposed checklist, as best practices (see Section 5 ).

The methodological quality of experiments and reviews performed

in the medical domain was assessed by Downs et al. [10] who formed

a checklist consisting of 26 items/ questions for assessing the quality

of randomized and non-randomized healthcare studies. The main qual-

ity aspects captured in this checklist involved the Reporting stage, the

External Validity, the Internal Validity and the Selection Bias. Further-

203

A. Ampatzoglou et al. Information and Software Technology 106 (2019) 201–230

Table 2

Quality Assessment Criteria in Medical Studies.

Primary study selection :

Was there duplicate study selection and data extraction? [31,39]

Was a comprehensive literature search performed? [7,30,31,39,43]

Was the status of publication (i.e. grey literature) used as an inclusion criterion? [39]

Have additional studies been identied? [52]

Assessing Validity of Primary Studies:

Were the eligibility criteria specied? [45]

Were statistical results and measures of variability presented for the primary outcome measures? [1,10,30,45]

Was the quality of the included studies assessed? [16,31,39,45,52]

Data reliability:

Was the likelihood of publication bias assessed? [11,37,39]

Were methods for data extraction and analysis evaluated? [10,30,31,39,52]

Was there any conict of interest stated? [39]

Research Design:

Was an ’a priori’ design provided? [31,39,43]

Was the scientic quality of the included studies used appropriately in formulating conclusions? [39,43]

Is a database, containing the relevant data, available as a resource for intervention planners and researchers? [52]

Was other pertinent information identied to ensure study intervention’s applicability in

settings and populations other than that studied by the investigators? [52]

Reporting Process:

Was a list of studies (included and excluded) provided? [31,39]

Were the characteristics of the included studies provided? [39,52]

Was the scientic quality of the included studies documented? [7,39]

more, the Prisma-P meta-analysis protocol for systematic reviews has

been proposed by Moher et al. [31] consisting of a checklist of 17 items

categorized into three main sections: Administrative information, Intro-

duction and Methods. The Administrative section represents mainly ini-

tial information on the authors, the funding and the title of the study,

the Introduction section includes details on the rationale and the ob-

jectives of the study while the Methods section species the informa-

tion sources, the study selection criteria, the search string and the data

analysis methods employed within the scope of the meta-analysis study.

Moreover, the medical domain uses the Cohraine database

(including

the Database of Abstracts of Reviews of Eects) [7] that contains more

than 15,000 abstracts of high quality reviews that are independently ap-

praised by two reviewers according to the following six criteria: report-

ing of inclusion/exclusion criteria, adequacy of search, data synthesis,

validity assessment of primary studies included and detailed presenta-

tion of individual studies referenced.

Shea et al. [39] developed an instrument to assess the methodolog-

ical quality of systematic reviews building upon previous tools, empir-

ical evidence and expert consensus. The tool was based on 11 compo-

nents that summarized and synthesized evidence from the initial quality

checklist that included 37 items. These items were subjected to principal

component analysis, and Varimax rotations. The validity of systematic

reviews is also assessed by Slocum et al. [43] who advise the researchers

of review studies to carefully dene research questions and focus on

them, and to systematically search the literature, validate primary stud-

ies and document the search process so as to enable reproducibility.

Furthermore, publication bias is acknowledged as a signicant problem

by Dwan et al. [11] as it produces outcome reporting bias, due to the fact

that positive results are easier to publish. In that case the authors ad-

vise the researchers to improve the reporting of trials (primary studies).

Publication bias is also addressed by Rothstein [37] who suggests the

use of funnel plots to detect it and the use of cumulative meta-analysis

to assess its impact.

Verhaegen et al. [45] adopted the Delphi technique, as a consen-

sus method, to identify quality criteria for selecting the primary studies

(referred to as Medical Clinical Trials) that participate in healthcare lit-

erature reviews. A three-round Delphi was performed where each partic-

ipant answered questions in the form of “Should this item be included

into the criteria list? ” utilizing a 5-point Likert scale. The quality cri-

http://community.cochrane.org/editorial-and-publishing-policy-

resource/overview-cochrane-library-and-related-content/databases-included-

cochrane-library/database-abstracts-reviews-eects-dare

teria derived from the nal Delphi round are included in Table 2 . We

note that we isolated the criteria that are not specialized in medical re-

search. In this context, blind assessment of clinical trial studies, treated

as primary studies in medical reviews, was proposed in [16] . The nd-

ings of [16] suggest that blind assessments are reliable producing more

consistent scores compared to open assessments. Furthermore, a data

collection instrument for performing systematic reviews for disease pre-

ventions was proposed by Zaza et al. [52] . The authors concluded in

a six point assessment form. The content of the form was developed

by reviewing methodologies from other systematic reviews; reporting

standards established by major health and social science journals; the

evaluation, statistical and meta-analytic literature; and by soliciting ex-

pert opinion. Avellar et al. [1] scanned 19 reviews in the medical eld

in order to examine the level to which external validity is addressed.

The results revealed that most studies lack statistical representativeness

in terms of the generalizability threat and focus only on factors likely

to increase the heterogeneity of primary studies and context [1] . With

respect to these results Avellar et al. [1] split external validity into three

aspects: generalizability (related to the number of studies reporting the

same result and the settings required to achieve a certain result), ap-

plicability (demographics of the population in which a certain result is

achieved) and feasibility (description of an intervention required to be

performed, in medical studies it is related to the dosage, the sta train-

ing, the cost).

2.3. Overview of guidelines for conducting secondary studies in software

engineering

In this section we present the most common guidelines for perform-

ing secondary studies in the software engineering domain, in an attempt

to consider relevant methodological problems and gain insights from

the reported advice and lessons learned. A summary of the guidelines

provided for conducting secondary studies in the software engineering

eld is presented in Fig. 1 . Similarly to the case of the quality assess-

ment criteria in medical studies, we intend to use these guidelines after

the development of the proposed classication schema. In particular,

we check if the practices reported in Fig. 1 are included in the list of

mitigation actions of the classication schema. Those that are not, will

be incorporated in the proposed checklist, as best practices.

The guidelines of Kitchenham et al. [18] are considered seminal for

performing Systematic Literature Reviews (SLRs) in software engineer-

ing. Three major stages for performing SLRs are suggested: Planning,

Conducting and Reporting, each of which including several mandatory

204

Identifying, categorizing and mitigating threats to validity in software engineering secondary studies

Figures

Citations

Extracting Knowledge From On-Line Sources for Software Engineering Labor Market: A Mapping Study

Architecting systems of systems: A tertiary study

A Systematic Literature Review on Using Machine Learning Algorithms for Software Requirements Identification on Stack Overflow

Software Engineering for AI-Based Systems: A Survey

Software Security Patch Management -- A Systematic Literature Review of Challenges, Approaches, Tools and Practices

References

Case Study Research: Design and Methods

Assessing the quality of reports of randomized clinical trials : is blinding necessary?

Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement

Quasi-Experimentation: Design & Analysis Issues for Field Settings

Ítems de referencia para publicar Protocolos de Revisiones Sistemáticas y Metaanálisis: Declaración PRISMA-P 2015 Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement

Related Papers (5)

Guidelines for conducting systematic mapping studies in software engineering : An update

Systematic mapping studies in software engineering

Systematic literature reviews in software engineering - A tertiary study

Guidelines for snowballing in systematic literature studies and a replication in software engineering

Evidence-Based Software Engineering and Systematic Reviews

Frequently Asked Questions (7)

Q1. What are the contributions in "Identifying, categorizing and mitigating threats to validity in software engineering secondary studies" ?

Q2. What is the common threat to the validity of a dataset?

Q3. What can lead to the omission of important studies?

Q4. What is the literature review of the iottware project?

Q5. What is the significance of the study selection criteria?

Q6. What is the way to determine the validity of the study?

Q7. Why did e validate the set of secondary studies?