scispace - formally typeset
Open AccessBook ChapterDOI

Requirements Coverage as an Adequacy Measure for Conformance Testing

TLDR
Existing adequacy measures for conformance testing that only consider model coverage can be strengthened by combining them with rigorous requirements coverage metrics.
Abstract
Conformance testing in model-based development refers to the testing activity that verifies whether the code generated (manually or automatically) from the model is behaviorally equivalent to the model. Presently the adequacy of conformance testing is inferred by measuring structural coverage achieved over the model. We hypothesize that adequacy metrics for conformance testing should consider structural coverage over the requirementseither in place of or in addition to structural coverage over the model. Measuring structural coverage over the requirements gives a notion of how well the conformance tests exercise the required behavior of the system. We conducted an experiment to investigate the hypothesis stating structural coverage over formal requirements is more effective than structural coverage over the model as an adequacy measure for conformance testing. We found that the hypothesis was rejected at 5% statistical significance on three of the four case examples in our experiment. Nevertheless, we found that the tests providing requirements coverage found several faults that remained undetected by tests providing model coverage. We thus formed a second hypothesis stating that complementing model coverage with requirements coverage will prove more effective as an adequacy measure than solely using model coverage for conformance testing. In our experiment, we found test suites providing both requirements coverage and model coverage to be more effective at finding faults than test suites providing model coverage alone, at 5% statistical significance. Based on our results, we believe existing adequacy measures for conformance testing that only consider model coverage can be strengthened by combining them with rigorous requirements coverage metrics.

read more

Content maybe subject to copyright    Report

Requirements Coverage as an Adequacy
Measure for Conformance Testing
Ajitha Rajan
1
, Michael Whalen
2
, Matt Staats
1
, and Mats P.E. Heimdahl
1
1
University of Minnesota
2
Ro ckwell Collins Inc.
arajan@cs.umn.edu
mwwhalen@rockwellcollins.com
staats@cs.umn.edu
heimdahl@cs.umn.edu
Abstract. Conformance testing in model-based development refers to
the testing activity that verifies whether the code generated (manually or
automatically) from the model is behaviorally equivalent to the model.
Presently the adequacy of conformance testing is inferred by measuring
structural coverage achieved over the model. We hypothesize that ade-
quacy metrics for conformance testing should consider structural coverage
over the requirements either in place of or in addition to structural cover-
age over the model. Measuring structural coverage over the requirements
gives a notion of how well the conformance tests exercise the required
b ehavior of the system.
We conducted an experiment to investigate the hypothesis stating struc-
tural coverage over formal requirements is more effective than structural
coverage over the mo del as an adequacy measure for conformance testing.
We found that the hypothesis was rejected at 5% statistical significance
on three of the four case examples in our exp eriment. Nevertheless, we
found that the tests providing requirements coverage found several faults
that remained undetected by tests providing model coverage. We thus
formed a second hypothesis stating that complementing model cover-
age with requirements coverage will prove more effective as an adequacy
measure than solely using model coverage for conformance testing. In our
exp eriment, we found test suites providing both requirements coverage
and model coverage to be more effective at finding faults than test suites
providing model coverage alone, at 5% statistical significance. Based on
our results, we believe existing adequacy measures for conformance test-
ing that only consider model coverage can be strengthened by combining
them with rigorous requirements coverage metrics.
This work has been partially supported by NASA Ames Research Center Cooperative
Agreement NNA06CB21A, NASA IV&V Facility Contract NNG-05CB16C, and the
L-3 Titan Group.

1 Introduction
In critical avionics applications, the validation and verification phase (V&V) is
particularly costly and consumes a disproportionably large share of the devel-
opment resources. Thus, if the process of deriving test cases for V&V can be
automated to provide test suites that satisfy the most stringent standards (such
as DO-178B in civil avionics [20]), dramatic time and cost savings can be re-
alized. The current trend towards model-based development is one attempt to
address this problem. In model-based software development, the traditional test-
ing process is split into two distinct activities: one activity that tests the model
to validate that it accurately captures the customers’ high-level requirements,
and another testing activity that verifies whether the code generated (manually
or automatically) from the model is behaviorally equivalent to (or conforms to)
the model. (Note that by “model”, we are referring specifically to a high level
formal model written in a language such as Simulink or Lustre. Throughout
this paper, we refer to this simply as a “model”.) In this paper, we focus on
the second testing activity—verification through conformance testing. There are
currently several tools, such as model checkers, that provide the capability to
automatically generate conformance tests [19, 7] from formal models. In this pa-
per, we examine the effectiveness of metrics used in measuring the adequacy of
the generated conformance tests.
For critical avionics software, DO-178B necessitates test cases used in verifi-
cation to achieve requirements coverage in addition to structural coverage over
the code. However, there is no direct and objective measure of requirements
coverage, and adequacy of tests is instead inferred by examining structural cov-
erage achieved over the model. The Modified Condition and Decision Coverage
(MC/DC) used when testing highly critical software [20] in the avionics industry
has been a natural choice to measure structural coverage for the most critical
models. In our work [21], however, we have defined coverage metrics that pro-
vide direct and objective measures of how well a test suite exercises a set of
high-level formal software requirements. We examined using requirements cov-
erage metrics, in particular the Unique First Cause (UFC) coverage metric, to
measure adequacy of tests used in model validation (or black-box testing) and
found them to be useful. To save time and effort, we would like to re-use val-
idation tests providing requirements coverage for verification of code through
conformance testing as well. This paper examines the suitability of using tests
providing requirements UFC coverage for conformance testing as opposed to
tests providing MC/DC over the model.
We believe requirements coverage will b e useful as an adequacy measure for
conformance testing for several reasons. First, measuring structural coverage
over the requirements gives a direct assessment of how well the conformance
tests exercise the required behavior of the system. Second, if a model is miss-
ing functionality, measuring structural coverage over the model will not expose
such defects of omission. Third, obligations for requirements coverage describe
satisfying scenarios (paths) in the model as opposed to satisfying states defined
by common model coverage obligations (such as MC/DC). We believe coverage

obligations that define satisfying paths will necessitate longer and more effective
test cases than those defining satisfying states in the model. Finally, we found
in [16] that structural coverage metrics over the model, in particular MC/DC, are
sensitive to the structure of the model used in coverage measurement. Therefore,
these metrics can be easily rendered inefficient by (purposely or inadvertently)
restructuring the model to make it easier to achieve the desired coverage.
For these reasons, we believe that requirements coverage will serve as a
stronger adequacy measure than model coverage in measuring adequacy of con-
formance test suites. More specifically, we investigate the following hypothesis
in this paper:
Hypothesis 1 (H
1
): Conformance tests providing requirements UFC coverage
are more effective at fault finding than conformance tests providing MC/DC
over the model.
We evaluated this hypothesis on four industrial examples from the civil avion-
ics domain. The requirements for these systems are formalized as Linear Tem-
poral Logic (LTL) [5] properties. The systems were modeled in the Simulink
notation [12]. Using the Simulink models, we created implementations that we
used as the basis for the generation of large sets of mutants by randomly seed-
ing faults. We generate numerous test suites to provide 100% achievable UFC
coverage over the LTL properties (the formal requirements), and numerous test
suites to provide 100% achievable MC/DC over the model. We assessed the ef-
fectiveness of the different test suites by measuring their fault finding capability,
i.e., running them over the sets of mutants and measuring the number of faults
detected.
In our experiment we found that Hypothesis 1 was rejected on three of the
four examples at the 5% statistical significance level. This result was somewhat
disappointing since we believed that the requirements coverage would be effec-
tive as a conformance testing measure. The astute reader might point out that
the result might not be surprising since the effectiveness of the requirements-
based tests providing UFC coverage heavily depends on the ‘goodness’ of the
requirements set; in other words, a poor set of requirements leads to poor tests.
In this case, however, we worked with case examples with very good sets of re-
quirements and we had exp ected better results. Nevertheless, we found that the
tests providing requirements UFC coverage found several faults that remained
undetected by tests providing MC/DC over the model. We thus formed a second
hypothesis stating that complementing model coverage with requirements cov-
erage will prove more effective as an adequacy measure than solely using model
coverage for conformance testing. To investigate this, we formulated and tested
the following hypothesis:
Hypothesis 2 (H
2
): Conformance tests providing requirements UFC coverage
in addition to MC/DC over the model are more effective at fault finding than
conformance tests providing only MC/DC over the model.
In our second set of experiments, the combined test suites were significantly
more effective than MC/DC test suites on three of the four case examples (at the

5% statistical significance level). For these examples, UFC suites found several
faults not revealed by the MC/DC suites making the combination of UFC and
MC/DC more effective than MC/DC alone. The relative improvement was in
the range of 4.3% 10.8% on these examples. We strongly believe that for
the case example that did not support Hypothesis 2, the MC/DC suite found
all possible faults, making improvement with the combined suites impossible.
Based on our results, we believe that existing adequacy measures for conformance
testing based solely on structural coverage over the model (such as MC/DC) can
be strengthened by combining them with requirements coverage metrics such as
UFC. It is worth noting that Briand et.al. found similar results in their study [3],
though in the context of state-based testing for complex component models in
object-oriented software. Combining a state-based testing technique for classes
or class clusters modeled with statecharts [8], with a black-box testing technique,
category partition testing, proved significantly more effective in fault detection.
We recommend future measures of conformance testing adequacy to consider
both requirements and model coverage either by combining existing metrics,
such as MC/DC and UFC, or by defining new metrics that account for both.
The remainder of the paper is organized as follows. Section 2 introduces our
experimental setup and the case examples used in our investigation. Results and
statistical analysis are presented in Section 3. Finally in Sections 4 and 5, we an-
alyze and discuss the implications of our results, and p oint to future directions.
2 Experiment
We use four industrial systems in our experiment: two models from a display
window manager for an air-transport class aircraft (DWM 1, DWM 2), and two
models representing flight guidance mode logic for business and regional jet
class aircrafts (Vertmax Batch and Latctl Batch). All four systems were viewed
to have good sets of requirements as judged by the developer of the system. We
conducted the experiments for each case example using the steps outlined below
(elaborated in later sections):
1. Generate and reduce test suites to provide requirements UFC cov-
erage: We generated a test suite to provide UFC coverage over the formalized
LTL requirements. This test suite was na¨ıvely generated, one test case for every
UFC obligation, and thus highly redundant. We reduced the test suite randomly
while maintaining UFC coverage over the requirements. We generated three such
randomly reduced test suites.
2. Generate and reduce test suites to provide MC/DC over the model:
We na¨ıvely generated a test suite to provide MC/DC over the model. We then
randomly reduced the test suite to maintain MC/DC over the model. We gen-
erated three such reduced test suites.
3. Combined test suites that provide MC/DC + requirements UFC:
Among the reduced MC/DC suites from the previous step, we selected the most
effective MC/DC test suite based on their fault finding ability. We merge this
test suite with each of the reduced UFC test suites from the first step. The

combined suites thus provide both MC/DC over the model and UFC coverage
over the requirements.
4. Generate mutants: We randomly seeded faults in the correct implemen-
tation and generated three sets of 200 mutants using the method outlined in
Section 2.3.
5. Assess and compare fault finding: We run each of the test suites from
steps 1, 2 and 3 (that provide requirements UFC coverage, MC/DC over the
model, and MC/DC + requirements UFC coverage respectively) against each
set of mutants and the model. Note that the model serves as the oracle imple-
mentation in conformance testing. We say that a mutant is killed (or detected)
by a test suite if any of the test cases in the suite results in different output
values between the model and the mutant. We recorded the number of mutants
killed by each test suite and computed the fault finding ability as the percentage
of mutants killed to the total number of mutants seeded.
2.1 Case Examples
In our experiment, we use four industrial systems. All four systems were modeled
using the Simulink notation from Mathworks Inc.
Display Window Manager Models (DWM 1 and DWM 2): The Display
Window Manager models, DWM 1, and DWM 2, represent 2 of the 5 major
subsystems of the Display Window Manager (DWM) of an air transport-level
commercial displays system. The DWM acts as a ‘switchb oard’ for the system
and has several responsibilities related to routing information to the displays
and manages the location of two cursors that can be used to control applications
by the pilot and copilot.
Flight Guidance System: A Flight Guidance System is a component of
the overall Flight Control System (FCS) in a commercial aircraft. It compares
the measured state of an aircraft (position, speed, and altitude) to the desired
state and generates pitch and roll-guidance commands to minimize the difference
between the measured and desired state. The FGS consists of the mode logic,
which determines which lateral and vertical modes of operation are active and
armed at any given time, and the flight control laws that accept information
about the aircraft’s current and desired state and compute the pitch and roll
guidance commands. The two FGS models in this paper focus on the mode logic
of the FGS. The Vertmax Batch and Latctl Batch models describe the vertical
and lateral mode logic for the flight guidance system.
2.2 Test Suite Generation and Reduction
We generated test suites to provide UFC coverage over formal LTL requirements
and to provide MC/DC over the model. The approach to generate and reduce

Citations
More filters
Book ChapterDOI

Dafny: an automatic program verifier for functional correctness

TL;DR: A tour of the language and verifier Dafny, which has been used to verify the functional correctness of a number of challenging pointer-based programs, is given and the full functional specification of the Schorr-Waite algorithm is shown.
Book ChapterDOI

Formalising java's data race free guarantee

TL;DR: The data race free (DRF) guarantee provided by Java, as captured by the semi-formal Java Memory Model (JMM), is formalised and found that not all of the anticipated conditions in the JMM definition were actually necessary for the DRF guarantee.
Book ChapterDOI

Comparing the Expressiveness of Timed Automata and Timed Extensions of Petri Nets

TL;DR: This paper shall semi-formally introduce these time dependant models, discuss their strengths and weaknesses, and provide an overview of the known results about the relationships among the models.
Book ChapterDOI

The Why/Krakatoa/Caduceus platform for deductive program verification

TL;DR: The Why/Krakatoa/Caduceus set of tools for deductive verification of Java and C source code is presented.
Book ChapterDOI

Mutation Testing Advances: An Analysis and Survey

TL;DR: This chapter presents a survey of recent advances, over the past decade, related to the fundamental problems of mutation testing and sets out the challenges and open problems for the future development of the method.
References
More filters

Model checking

TL;DR: Model checking tools, created by both academic and industrial teams, have resulted in an entirely novel approach to verification and test case generation that often enables engineers in the electronics industry to design complex systems with considerable assurance regarding the correctness of their initial designs.
Book

The Design of Experiments

R. A. Fisher
Book

Applied nonparametric statistics

TL;DR: In this paper, applied nonparametric statistics are applied to the problem of applied non-parametric statistical data collection in the context of the application of applied NN statistics, including:
Book

Applied nonparametric statistics

TL;DR: In this article, applied nonparametric statistics are applied to the problem of applied non-parametric statistical data collection in the context of the application of applied NN statistics, including:
Proceedings ArticleDOI

Is mutation an appropriate tool for testing experiments

TL;DR: It is concluded that, based on the data available thus far, the use of mutation operators is yielding trustworthy results (generated mutants are similar to real faults); Mutants appear however to be different from hand-seeded faults that seem to be harder to detect than real faults.
Frequently Asked Questions (11)
Q1. What are the contributions in "Requirements coverage as an adequacy measure for conformance testing" ?

The authors hypothesize that adequacy metrics for conformance testing should consider structural coverage over the requirements either in place of or in addition to structural coverage over the model. The authors conducted an experiment to investigate the hypothesis stating structural coverage over formal requirements is more effective than structural coverage over the model as an adequacy measure for conformance testing. The authors thus formed a second hypothesis stating that complementing model coverage with requirements coverage will prove more effective as an adequacy measure than solely using model coverage for conformance testing. Based on their results, the authors believe existing adequacy measures for conformance testing that only consider model coverage can be strengthened by combining them with rigorous requirements coverage metrics. This work has been partially supported by NASA Ames Research Center Cooperative Agreement NNA06CB21A, NASA IV & V Facility Contract NNG-05CB16C, and the L-3 Titan Group. The authors found that the hypothesis was rejected at 5 % statistical significance on three of the four case examples in their experiment. In their experiment, the authors found test suites providing both requirements coverage and model coverage to be more effective at finding faults than test suites providing model coverage alone, at 5 % statistical significance. 

The authors hope to investigate this further in their future work. In their future work, the authors hope to define requirements coverage metrics that are more robust to the structure of the requirements. Test suites providing requirements coverage may be ineffective even with an excellent set of requirements. 

Using the Simulink models, the authors created implementations that the authors used as the basis for the generation of large sets of mutants by randomly seeding faults. 

the rigor and robustness (with respect to requirements structure) of the requirements coverage metric used plays an important role in the effectiveness of the generated test suites. 

the authors found that requirements coverage is useful when used in combination with model coverage to measure adequacy of conformance test suites. 

There are currently several tools, such as model checkers, that provide the capability to automatically generate conformance tests [19, 7] from formal models. 

The authors assessed the effectiveness of the different test suites by measuring their fault finding capability, i.e., running them over the sets of mutants and measuring the number of faults detected. 

On the DWM 1, Vertmax Batch, and Latctl Batch systems the UFC suites do reasonably well, achieving an average MC/DC of 78.2%, 88.6%, and 80.9% respectively as compared to 92.5%, 98% and 99.8% achievable MC/DC. 

When performing a permutation test, a reference distribution is obtained by calculating all possible permutations of the observations [6, 11]. 

Under such circumstances, the fault finding improvement observed on combining the test suites would be solely due to the increased number of test cases. 

The authors generated multiple mutant sets for each example to reduce potential bias in their results from a mutant set that may have very hard (or easy) faults to detect.