scispace - formally typeset
Open AccessJournal ArticleDOI

Identifying, categorizing and mitigating threats to validity in software engineering secondary studies

Reads0
Chats0
TLDR
A classification schema for reporting threats to validity and possible mitigation actions is proposed, which authors of secondary studies can use for identifying and categorizing threats tovalidity and corresponding mitigation actions, while readers of secondary Studies can use the checklist for assessing the validity of the reported results.
Abstract
Context Secondary studies are vulnerable to threats to validity. Although, mitigating these threats is crucial for the credibility of these studies, we currently lack a systematic approach to identify, categorize and mitigate threats to validity for secondary studies. Objective In this paper, we review the corpus of secondary studies, with the aim to identify: (a) the trend of reporting threats to validity, (b) the most common threats to validity and corresponding mitigation actions, and (c) possible categories in which threats to validity can be classified. Method To achieve this goal we employ the tertiary study research method that is used for synthesizing knowledge from existing secondary studies. In particular, we collected data from more than 100 studies, published until December 2016 in top quality software engineering venues (both journals and conference). Results Our results suggest that in recent years, secondary studies are more likely to report their threats to validity. However, the presentation of such threats is rather ad hoc, e.g., the same threat may be presented with a different name, or under a different category. To alleviate this problem, we propose a classification schema for reporting threats to validity and possible mitigation actions. Both the classification of threats and the associated mitigation actions have been validated by an empirical study, i.e., Delphi rounds with experts. Conclusion Based on the proposed schema, we provide a checklist, which authors of secondary studies can use for identifying and categorizing threats to validity and corresponding mitigation actions, while readers of secondary studies can use the checklist for assessing the validity of the reported results.

read more

Content maybe subject to copyright    Report

University of Groningen
Identifying, categorizing and mitigating threats to validity in software engineering secondary
studies
Ampatzoglou, Apostolos; Bibi, Stamatia; Avgeriou, Paris; Verbeek, Marijn; Chatzigeorgiou,
Alexander
Published in:
Information and Software Technology
DOI:
10.1016/j.infsof.2018.10.006
IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from
it. Please check the document version below.
Document Version
Publisher's PDF, also known as Version of record
Publication date:
2019
Link to publication in University of Groningen/UMCG research database
Citation for published version (APA):
Ampatzoglou, A., Bibi, S., Avgeriou, P., Verbeek, M., & Chatzigeorgiou, A. (2019). Identifying, categorizing
and mitigating threats to validity in software engineering secondary studies.
Information and Software
Technology
,
106
, 201-230. https://doi.org/10.1016/j.infsof.2018.10.006
Copyright
Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the
author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).
The publication may also be distributed here under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license.
More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverne-
amendment.
Take-down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the
number of authors shown on this cover page is limited to 10 maximum.

Information and Software Technology 106 (2019) 201–230
Contents lists available at ScienceDirect
Information and Software Technology
journal homepage: www.elsevier.com/locate/infsof
Identifying, categorizing and mitigating threats to validity in software
engineering secondary studies
Apostolos Ampatzoglou
a
,
, Stamatia Bibi
b
, Paris Avgeriou
c
, Marijn Verbeek
c
,
Alexander Chatzigeorgiou
a
a
Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece
b
Department of Informatics and Telecommunications, University of Western Macedonia, Kozani, Greece
c
Department of Mathematics and Computer Science, University of Groningen, the Netherlands
Keywords:
Empirical software engineering
Secondary studies
Threats to Validity
Literature Review
Context: Secondary studies are vulnerable to threats to validity. Although, mitigating these threats is crucial
for the credibility of these studies, we currently lack a systematic approach to identify, categorize and mitigate
threats to validity for secondary studies.
Objective: In this paper, we review the corpus of secondary studies, with the aim to identify: (a) the trend of
reporting threats to validity, (b) the most common threats to validity and corresponding mitigation actions, and
(c) possible categories in which threats to validity can be classied.
Method: To achieve this goal we employ the tertiary study research method that is used for synthesizing knowl-
edge from existing secondary studies. In particular, we collected data from more than 100 studies, published until
December 2016 in top quality software engineering venues (both journals and conference).
Results: Our results suggest that in recent years, secondary studies are more likely to report their threats to
validity. However, the presentation of such threats is rather ad hoc, e.g., the same threat may be presented with
a dierent name, or under a dierent category. To alleviate this problem, we propose a classication schema for
reporting threats to validity and possible mitigation actions. Both the classication of threats and the associated
mitigation actions have been validated by an empirical study, i.e., Delphi rounds with experts.
Conclusion: Based on the proposed schema, we provide a checklist, which authors of secondary studies can use for
identifying and categorizing threats to validity and corresponding mitigation actions, while readers of secondary
studies can use the checklist for assessing the validity of the reported results.
1.
Introduction
Empirical Software Engineering (ESE) research focuses on the ap-
plication of empirical methods on any phase of the software develop-
ment lifecycle. The three predominant types of empirical research are
[44,47] : (a) surveys, which are performed through questionnaires or in-
terviews on a sample in order to obtain characteristics of a population
[36] ; (b) case studies , which study phenomena in a “real-world ”context,
especially when the boundaries between phenomenon and context are
not clear [51] ; and (c) experiments , which have a limited scope and are
most often run in a laboratory setting, with a high level of control [47] .
During the last years and mainly due to the rise of the Evidence-Based
Corresponding author.
E-mail address: apostolos.ampatzoglou@gmail.com (A. Ampatzoglou).
Software Engineering (EBSE) Paradigm
1
[22] , two other types of studies
have become quite popular [15] :
Systematic Literature Reviews (SLRs) use data from previously
published studies for the purpose of research synthesis , which is the
collective term for a family of methods for summarizing, integrating
and, when possible, combining the ndings of dierent studies on
a topic or research question. Such synthesis can also identify cru-
cial areas and questions that have not been addressed adequately
with past empirical research. It is built upon the observation that
no matter how well-designed and executed, empirical ndings from
individual studies are limited in the extent to which they may be
generalized [18] .
1
EBSE is a movement in the software engineering research that aims to pro-
vide the means by which current best evidence from research can be integrated with
practical experience [22] .
https://doi.org/10.1016/j.infsof.2018.10.006
Received 20 February 2018; Received in revised form 4 October 2018; Accepted 6 October 2018
Available online 9 October 2018
0950-5849/© 2018 Elsevier B.V. All rights reserved.

A. Ampatzoglou et al. Information and Software Technology 106 (2019) 201–230
Systematic Mapping Studies which use the same basic methodol-
ogy as SLRs but aim to identify and classify all research related to a
broad software engineering topic rather than answering questions
about the relative merits of competing technologies that conven-
tional SLRs address. They are intended to provide an overview of a
topic area and identify whether there are sub-topics with sucient
primary studies to conduct conventional SLRs and also to identify
sub-topics where more primary studies are needed [21] .
The strength of evidence produced by ESE research depends largely
on the use of systematic, rigorous guidelines on how to conduct, and re-
port empirical results (see e.g., for experiments [47] , for SLRs [18] , for
mapping studies [34] , for surveys [36] , and for case studies [38] ). One
of the most crucial parts of conducting an empirical study is the manage-
ment of threats to validity, i.e., possible aspects of the research design
that in some way compromise the credibility of results. Despite this cru-
cial role, we currently lack guidelines on how to identify, mitigate, and
categorize threats to validity in secondary studies; this is in contrast to
experiments, case studies and surveys, where mature guidelines exist.
Due to this reason, researchers either do not report threats to validity
for secondary studies, or report them in an ad hoc way (see Section 5 ).
Specically, the most common issues found in practice, concern threats
to validity being:
Completely missing from certain studies. Thus, such studies do not
provide any mitigation actions for them;
Incorrectly categorized . The same threat is classied in dierent
categories by dierent researchers (e.g., study selection bias is catego-
rized in some studies as threat to internal and in others as a threat to
conclusion validity . Also, in some cases threats are inefficiently cat-
egorized based on guidelines for other types of empirical research
(e.g., for experiments [45] , or for case studies [38] ), or under a cus-
tom categorization, which is not uniform . One possible reason for
this problem is the fact that threat categories are not orthogonal, es-
pecially in cases where they stem from dierent schools of thought
or guidelines (see Section 2.1 ). For example, reliability examines if
the results of a study depend highly on the involved researchers.
In turn, this relates to conclusion validity, in the sense that people
are prone to biases (e.g. due to previous experiences, preferences on
research, etc.);
Inconsistently named . The same threat is reported with a dierent
name by dierent researchers (e.g., the terms publication bias and
researcher bias are used for describing the same threats);
Inconsistently mitigated . The same threat is mitigated dierently
by dierent researchers. Although this provides a variety of available
mitigation actions, some mitigation actions are ineective and cause
confusion to readers who consider following them.
These issues, in turn lead to a diculty in evaluating the validity
of the reported results and hinder a uniform comparison between sec-
ondary studies. In addition, the lack of guidance for mitigating threats
to validity, which could serve as a reference point, makes it more di-
cult to reuse mitigation strategies, as well as to consistently identify and
categorize both threats and mitigation actions.
To address this problem, we conducted a tertiary study (i.e., an SLR
on secondary studies), so as to retrieve and analyze how software en-
gineering secondary studies identify, categorize and mitigate threats to
validity. The objective of this tertiary study is: “to summarize secondary
studies that report threats to validity , with the aim of identifying: (a) the
frequency of reporting threats to validity over the years, (b) the most
common threats to validity and (c) the corresponding mitigation actions ,
and (d) a possible classification schema of threats to validity ”. The main
outcomes of the study are a classication schema for threats to validity
and a checklist that can be used while conducting/evaluating secondary
studies. The outcomes are expected to contribute towards establishing
a standard and consistent way of identifying, categorizing and miti-
gating threats to validity of secondary studies. In addition to that, in
order to enrich the outcomes of this work we explored existing litera-
ture in two related research sub-elds: (a) secondary studies in medi-
cal science (i.e., the area from where the Evidence-Based paradigm has
emerged from), and (b) guidelines for conducting secondary studies.
Related studies from medical science and the guidelines for perform-
ing secondary studies has led to the identication of best practices in
secondary studies that can be applied as mitigation actions for minimiz-
ing of eects of a validity threat, enriching the provided checklist that
has been derived from the classication schema. Finally, acknowledging
the subjectivity in the qualitative nature of this work, we validated the
outcomes through a Delphi method based on the opinion of experts in
secondary studies and empirical studies in general. The Delphi method
was iterated in three rounds and provided preliminary evidence for the
merits of the classication schema and checklist.
We note that literature reviews have been performed long before the
advent of the terms ‘Systematic Mapping Study’ and ‘Systematic Liter-
ature Review’ and corresponding guidelines. We also acknowledge that
secondary studies can be performed without following the guidelines
of SMSs and SLRs (especially before the two terms become popular).
However, such non-systematic literature reviews have not reported (in
the vast majority of the cases) threats to their conclusions. Reporting
of threats became popular once specic guidelines were proposed and
adopted in the context of the EBSE paradigm. Thus, for a study aiming
at systematically analyzing the reported threats, we consider it proper
to focus on the studies that have adopted the corresponding guidelines.
For the rest of the study, when we refer to secondary studies, we refer
to Systematic Mapping Studies and Systematic Literature Reviews.
The rest of the paper is organized as follows: Section 2 presents re-
lated work, i.e., categories of threats to validity in other empirical meth-
ods; Section 3 presents our tertiary study protocol; Section 4 reports on
the results; and Section 5 discusses the proposed guidelines for identify-
ing, categorizing and mitigating threats to validity for secondary studies
in software engineering. In Section 6 , we present the design and results
of our validation study, whereas in Sections 7 and 8 we present threats
to validity and conclude the paper.
2.
Related work
The empirical software engineering literature points out the rele-
vance and importance of identifying and recording validity threats, as
an aspect of research quality [12,32] and [35] . According to Perry et al.
[32] the structure of an empirical study in SE should include a section
of threats to validity. This section should discuss the inuences that
may limit the authors’ and readers’ ability to interpret or draw conclu-
sions from the study’s data. In addition, Jedlitschka et al. [17] suggest
that each controlled experiment in SE should have a subsection named
Limitation of the study ”where all threats that may have an impact on
the validity of results should be mentioned. Furthermore, Kitchenham
[22] has also underlined the importance of threats to validity, by high-
lighting that the implications of a validity threat should be addressed
and thoroughly discussed. Finally, Sjoberg et al. [42] emphasize the
scope of validity of the results of a SE study; the term ‘scope of va-
lidity’ is interpreted as the population of actors, technologies, activities,
software systems for which the results of a study are valid. The scope
of validity is considered to be crucial for producing general knowledge
synthesized by comparing and integrating results from dierent studies.
In this section we present related work, under three perspectives.
First, we present how threats to validity are categorized in the empiri-
cal software engineering eld (see Section 2.1 ). Second, in Section 2.2 ,
we present studies that are related to the identication and reporting
of threats to validity in medical science. This can provide valuable in-
put for our work, since medical research is considered a more mature
eld in secondary study design and execution and has already inspired
the guidelines for conducting secondary studies in software engineering.
Finally, in Section 2.3 , we present the most common guidelines for per-
202

A. Ampatzoglou et al. Information and Software Technology 106 (2019) 201–230
Table 1
Categories of Threats to Validity in ESE Research.
Conclusion validity : Originally called “statistical conclusion validity ”, this aspect deals with the degree to which conclusions reached (e.g. about relationships between factors) are
reasonable within the data collected. Researcher bias, for example, can greatly impact conclusions reached and can be considered to be a threat to conclusion validity. Similarly,
statistical analysis may lead to weak results that can be interpreted in dierent ways according to the bias of the researcher. In either case the researcher may reach the wrong
conclusion [47] .
Reliability : This aspect is concerned with to what extent the data and the analysis are dependent
on the specic researchers. Example of this type of threat is the unclear coding of
collected data. If a researcher produces certain results, then, other researchers should be able to reproduce identical results following the same methodology of the study [38] .
Internal validity : This aspect relates to the examination
of causal relations. Internal validity examines whether an experimental treatment/condition makes a dierence or not, and
whether there is evidence to support the claim [47] .
Construct validity : Denes how eectively a test or experiment measures up to its claims. This aspect deals with whether or not the researcher
measures what is intended to be
measured [47] .
External validity : The concern of this aspect is whether the results can be generalized. During the analysis of this validity, the researcher attempts to see if ndings of the study are
of relevance for others. In the case of quantitative research
(experiments), this primarily relies on the chosen sample size. In contrast, case studies have normally a low sample size, so
the researcher has to try and analyze to what extent the ndings can be related to other cases [47] .
forming secondary studies in the software engineering domain, as they
can also provide input for our work.
2.1. Threats to validity in empirical software engineering
Threats to validity have been often categorized in the literature of
general research methods in dierent types. Initially, Cook and Camp-
bell [8]
2
recorded four types of validity threats in quantitative experi-
mental analysis: statistical conclusion validity, internal validity, construct
validity of putative causes and effects and external validity . Concerning
qualitative research, Maxwell [29] provided a general categorization of
threats that can be mapped to Cook and Campbell’s categorization as fol-
lows: theoretical validity (construct validity), generalizability (internal, ex-
ternal validity), and interpretive validity (statistical conclusion validity).
An additional threat category, mentioned by Maxwell [29] , is descriptive
validity, which is relevant only for qualitative studies. Descriptive valid-
ity reects the accuracy and objectivity of the information gathered. For
example, when researchers collect statements from participants, threats
to validity can be related to the way that researchers recorded or tran-
scribed the statements. Other types of validity threats that are found
in literature are: reliability [38,51] , transferability, credibility and con-
rmability [27] , uncontrollability, and contingency [ 14 ].
In the empirical SE community there are two main schools on re-
porting threats to validity: (a) Wohlin et al. [47] who adopted Cook
and Campbell’s [8] categorization of validity threats and presented four
main types of threats to validity for quantitative research within soft-
ware engineering: conclusion, internal, construct , and external valid-
ity; and (b) Runeson et al. [38] who discussed four main types of va-
lidity threats for case studies within software engineering: reliability,
internal, construct , and external validity. The threats of Runeson et al.
[38] are similar to those of Wohlin et al. [47] with the exception of
reliability replacing conclusion validity.
Bi et al. [4] argue that researchers should also consolidate actual
experimental research on a specic topic to complement existing generic
threats and guidelines when performing their research. The tradeo be-
tween internal and external validity has been addressed by Siegmund
et al. [40] , where the authors performed a survey and concluded that
externally valid papers are of greater practicality while internally valid
studies seem to be unrealistic. Additionally, the study examined the im-
pact of replication studies and found that although researchers realize
the necessity of such studies they are reluctant to conduct or review
them mainly due to the fact that there are no guidelines for performing
them [40] . A list of denitions of the union of the aforementioned cat-
egories of threats to validity (i.e. from [38] and [47] ) are presented in
Table 1 .
Petersen et al. [35] based on the categorizations of threats to validity
suggested by Maxwell, suggested a check list that can help researchers
2
Before publishing this paper (i.e., [8] ), Cook and Campbell had published
an online chapter focused on Conclusion and Internal validity threats.
identify the threats applicable to the type of research performed by re-
porting rst their world-view and then the research method applied. A
secondary study attempting to assess the practices in reporting validity
threats in ESE [12] concluded that more than 20% of the studied papers
contain no discussion of validity threats and the ones that do discuss
validity threats on average contain 5.44 threats.
Regarding threats to validity for secondary studies in software engi-
neering, we have been able to identify only one related work. In partic-
ular, Zhou et al. [53] have performed a tertiary study on more than 300
secondary studies until 2015. The authors have identied 23 threats to
validity for secondary studies, and organize the consequences of these
studies into four categories: internal, external, conclusion, and construct
validity. To alleviate these threats the authors maps the threats and pos-
sible consequences to 24 mitigation strategies. This paper shares com-
mon goals with our study, however, ours is broader in the sense that: (a)
it covers a wider timeframe (until 2017 instead of middle of 2015); (b)
it focuses only on top-quality venues, which are expected to pay special
attention in the proper application of methodological guidelines, such
as the proper reporting of threats to validity, a fact that increases the
quality of the obtained data; and most importantly (c) our study answers
two additional RQs, providing a classication schema and a checklist for
identifying, mitigating, and reporting threats to validity. In addition to
this, as indirect related work (especially in terms of mitigation actions),
in Section 2.3 we present a review of guidelines on secondary studies in
software engineering.
2.2. Threats to validity in medical science
In this section we report on quality assessment strategies for sys-
tematic reviews from medical science literature. While there is no clas-
sication of threats to validity for secondary studies or corresponding
mitigation actions in medical research, these quality assessment strate-
gies can provide useful input for deriving such outcomes in the software
engineering domain. Particularly we identify a number of quality assess-
ment criteria based on the guidelines, the checklists and protocols found
in medical research literature. These quality assessment criteria are sub-
sequently classied into ve categories, presented in Table 2 , based on
the aspect that they address: (a) primary study selection process, (b)
validity of primary studies (c) data reliability, (d) research design and
(e) reporting process. An additional factor that aects the quality of
secondary studies is the level of detail and completeness of reporting.
The criteria in Table 2 will be exploited after the development of the
proposed classication schema. In particular, we check if the criteria in
Table 2 are included in the list of mitigation actions; if not we incorpo-
rate them in the proposed checklist, as best practices (see Section 5 ).
The methodological quality of experiments and reviews performed
in the medical domain was assessed by Downs et al. [10] who formed
a checklist consisting of 26 items/ questions for assessing the quality
of randomized and non-randomized healthcare studies. The main qual-
ity aspects captured in this checklist involved the Reporting stage, the
External Validity, the Internal Validity and the Selection Bias. Further-
203

A. Ampatzoglou et al. Information and Software Technology 106 (2019) 201–230
Table 2
Quality Assessment Criteria in Medical Studies.
Primary study selection :
Was there duplicate study selection and data extraction? [31,39]
Was a comprehensive literature search performed? [7,30,31,39,43]
Was the status of publication (i.e. grey literature) used as an inclusion criterion? [39]
Have additional studies been identied? [52]
Assessing Validity of Primary Studies:
Were the eligibility criteria specied? [45]
Were statistical results and measures of variability presented for the primary outcome measures? [1,10,30,45]
Was the quality of the included studies assessed? [16,31,39,45,52]
Data reliability:
Was the likelihood of publication bias assessed? [11,37,39]
Were methods for data extraction and analysis evaluated? [10,30,31,39,52]
Was there any conict of interest stated? [39]
Research Design:
Was an ’a priori’ design provided? [31,39,43]
Was the scientic quality of the included studies used appropriately in formulating conclusions? [39,43]
Is a database, containing the relevant data, available as a resource for intervention planners and researchers? [52]
Was other pertinent information identied to ensure study intervention’s applicability in
settings and populations other than that studied by the investigators? [52]
Reporting Process:
Was a list of studies (included and excluded) provided? [31,39]
Were the characteristics of the included studies provided? [39,52]
Was the scientic quality of the included studies documented? [7,39]
more, the Prisma-P meta-analysis protocol for systematic reviews has
been proposed by Moher et al. [31] consisting of a checklist of 17 items
categorized into three main sections: Administrative information, Intro-
duction and Methods. The Administrative section represents mainly ini-
tial information on the authors, the funding and the title of the study,
the Introduction section includes details on the rationale and the ob-
jectives of the study while the Methods section species the informa-
tion sources, the study selection criteria, the search string and the data
analysis methods employed within the scope of the meta-analysis study.
Moreover, the medical domain uses the Cohraine database
3
(including
the Database of Abstracts of Reviews of Eects) [7] that contains more
than 15,000 abstracts of high quality reviews that are independently ap-
praised by two reviewers according to the following six criteria: report-
ing of inclusion/exclusion criteria, adequacy of search, data synthesis,
validity assessment of primary studies included and detailed presenta-
tion of individual studies referenced.
Shea et al. [39] developed an instrument to assess the methodolog-
ical quality of systematic reviews building upon previous tools, empir-
ical evidence and expert consensus. The tool was based on 11 compo-
nents that summarized and synthesized evidence from the initial quality
checklist that included 37 items. These items were subjected to principal
component analysis, and Varimax rotations. The validity of systematic
reviews is also assessed by Slocum et al. [43] who advise the researchers
of review studies to carefully dene research questions and focus on
them, and to systematically search the literature, validate primary stud-
ies and document the search process so as to enable reproducibility.
Furthermore, publication bias is acknowledged as a signicant problem
by Dwan et al. [11] as it produces outcome reporting bias, due to the fact
that positive results are easier to publish. In that case the authors ad-
vise the researchers to improve the reporting of trials (primary studies).
Publication bias is also addressed by Rothstein [37] who suggests the
use of funnel plots to detect it and the use of cumulative meta-analysis
to assess its impact.
Verhaegen et al. [45] adopted the Delphi technique, as a consen-
sus method, to identify quality criteria for selecting the primary studies
(referred to as Medical Clinical Trials) that participate in healthcare lit-
erature reviews. A three-round Delphi was performed where each partic-
ipant answered questions in the form of “Should this item be included
into the criteria list? utilizing a 5-point Likert scale. The quality cri-
3
http://community.cochrane.org/editorial-and-publishing-policy-
resource/overview-cochrane-library-and-related-content/databases-included-
cochrane-library/database-abstracts-reviews-eects-dare
teria derived from the nal Delphi round are included in Table 2 . We
note that we isolated the criteria that are not specialized in medical re-
search. In this context, blind assessment of clinical trial studies, treated
as primary studies in medical reviews, was proposed in [16] . The nd-
ings of [16] suggest that blind assessments are reliable producing more
consistent scores compared to open assessments. Furthermore, a data
collection instrument for performing systematic reviews for disease pre-
ventions was proposed by Zaza et al. [52] . The authors concluded in
a six point assessment form. The content of the form was developed
by reviewing methodologies from other systematic reviews; reporting
standards established by major health and social science journals; the
evaluation, statistical and meta-analytic literature; and by soliciting ex-
pert opinion. Avellar et al. [1] scanned 19 reviews in the medical eld
in order to examine the level to which external validity is addressed.
The results revealed that most studies lack statistical representativeness
in terms of the generalizability threat and focus only on factors likely
to increase the heterogeneity of primary studies and context [1] . With
respect to these results Avellar et al. [1] split external validity into three
aspects: generalizability (related to the number of studies reporting the
same result and the settings required to achieve a certain result), ap-
plicability (demographics of the population in which a certain result is
achieved) and feasibility (description of an intervention required to be
performed, in medical studies it is related to the dosage, the sta train-
ing, the cost).
2.3. Overview of guidelines for conducting secondary studies in software
engineering
In this section we present the most common guidelines for perform-
ing secondary studies in the software engineering domain, in an attempt
to consider relevant methodological problems and gain insights from
the reported advice and lessons learned. A summary of the guidelines
provided for conducting secondary studies in the software engineering
eld is presented in Fig. 1 . Similarly to the case of the quality assess-
ment criteria in medical studies, we intend to use these guidelines after
the development of the proposed classication schema. In particular,
we check if the practices reported in Fig. 1 are included in the list of
mitigation actions of the classication schema. Those that are not, will
be incorporated in the proposed checklist, as best practices.
The guidelines of Kitchenham et al. [18] are considered seminal for
performing Systematic Literature Reviews (SLRs) in software engineer-
ing. Three major stages for performing SLRs are suggested: Planning,
Conducting and Reporting, each of which including several mandatory
204

Citations
More filters
Journal ArticleDOI

Extracting Knowledge From On-Line Sources for Software Engineering Labor Market: A Mapping Study

TL;DR: A Systematic Mapping Study on digital sources that can be used to address the data analytics needs of the labor market and aims to connect different skill types, needs and goals of labor market with the utilization of digital sources and data analysis methods.
Journal ArticleDOI

Architecting systems of systems: A tertiary study

TL;DR: This paper aims to investigate the current state of research on SoS architecting by synthesizing the demographic data, assessing the quality and the coverage of architecting activities and software quality attributes by the research, and distilling a concept map that reflects a community-wide understanding of the concept of SoS.
Journal ArticleDOI

A Systematic Literature Review on Using Machine Learning Algorithms for Software Requirements Identification on Stack Overflow

TL;DR: The SLR study revealed that while ML algorithms have phenomenal capabilities of identifying the software requirements on SO, they still are confronted with various open problems/issues that will eventually limit their practical applications and performances.
Journal ArticleDOI

Software Engineering for AI-Based Systems: A Survey

TL;DR: In this paper , the authors conducted a systematic mapping study on software engineering approaches for building, operating, and maintaining AI-based systems and identified multiple SE approaches for AIbased systems, which they classified according to the SWEBOK areas.
Posted Content

Software Security Patch Management -- A Systematic Literature Review of Challenges, Approaches, Tools and Practices

TL;DR: A systematic literature review of 72 studies on software security patch management published from 2002 to March 2020 reveals that two-thirds of the common challenges have not been directly addressed in the solutions and that most of them address the challenges in one stage of the process.
References
More filters
Book

Case Study Research: Design and Methods

Robert K. Yin
TL;DR: In this article, buku ini mencakup lebih dari 50 studi kasus, memberikan perhatian untuk analisis kuantitatif, membahas lebah lengkap penggunaan desain metode campuran penelitian, and termasuk wawasan metodologi baru.
Journal ArticleDOI

Assessing the quality of reports of randomized clinical trials : is blinding necessary?

TL;DR: An instrument to assess the quality of reports of randomized clinical trials (RCTs) in pain research is described and its use to determine the effect of rater blinding on the assessments of quality is described.
Journal ArticleDOI

Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement

TL;DR: A reporting guideline is described, the Preferred Reporting Items for Systematic reviews and Meta-Analyses for Protocols 2015 (PRISMA-P 2015), which consists of a 17-item checklist intended to facilitate the preparation and reporting of a robust protocol for the systematic review.
Related Papers (5)
Frequently Asked Questions (7)
Q1. What are the contributions in "Identifying, categorizing and mitigating threats to validity in software engineering secondary studies" ?

In this paper, the authors review the corpus of secondary studies, with the aim to identify: ( a ) the trend of reporting threats to validity, ( b ) the most common threats to validity and corresponding mitigation actions, and ( c ) possible categories in which threats to validity can be classified. To achieve this goal the authors employ the tertiary study research method that is used for synthesizing knowledge from existing secondary studies. To alleviate this problem, the authors propose a classification schema for reporting threats to validity and possible mitigation actions. Conclusion: Based on the proposed schema, the authors provide a checklist, which authors of secondary studies can use for identifying and categorizing threats to validity and corresponding mitigation actions, while readers of secondary studies can use the checklist for assessing the validity of the reported results. Their results suggest that in recent years, secondary studies are more likely to report their threats to validity. 

Threats to Data ValidityName DescriptionSmall sample size A small sample threatens the validity of the dataset, since results may be: (a) prone to bias (data might come from a small community), (b) not statistically significant, and (c) not safe to generalize. 

Missing non-English papersExploring studies written in a specific language can lead to the omission of important studies (or number of studies) written in other languages. 

Software aradigms, assessment types and non-functional requirements in modelased integration testing: a systematic literature review. 

Study selection validityStudy selection validity is recognized as the major threat in secndary studies during the early phases of the research. 

The selection of classification schema is biased 0% 0% 0% 42.9% 57.1% 0% The interpretation of results is not objective 0% 0% 0% 14.3% 85.7% 0% Research Validity Lack of repeatability 0% 0% 0% 14.3% 85.7% 0% A not fitting research method has been selected 0% 0% 0% 14.3% 85.7% 0% Answering the RQs cannot fulfill the goal 0% 0% 0% 14.3% 85.7% 0% Lack of comparable studies 0% 0% 14.3% 14.3% 57.1% 14.3% Researchers are not familiar with the research field 0% 0% 0% 28.6% 57.1% 14.3% Lack of generalizability 0% 0% 0% 28.6% 71.4% 0%Fig. 5a. Mitigation Actions for Study Selection Threats to Validity. 

To mitigate the risk of losing relevant studies e validated their set of secondary studies by cross-checking them against apers in other tertiary studies (serving as a gold standard).