What are the contributions in "Requirements coverage as an adequacy measure for conformance testing" ?

The authors hypothesize that adequacy metrics for conformance testing should consider structural coverage over the requirements either in place of or in addition to structural coverage over the model. The authors conducted an experiment to investigate the hypothesis stating structural coverage over formal requirements is more effective than structural coverage over the model as an adequacy measure for conformance testing. The authors thus formed a second hypothesis stating that complementing model coverage with requirements coverage will prove more effective as an adequacy measure than solely using model coverage for conformance testing. Based on their results, the authors believe existing adequacy measures for conformance testing that only consider model coverage can be strengthened by combining them with rigorous requirements coverage metrics. This work has been partially supported by NASA Ames Research Center Cooperative Agreement NNA06CB21A, NASA IV & V Facility Contract NNG-05CB16C, and the L-3 Titan Group. The authors found that the hypothesis was rejected at 5 % statistical significance on three of the four case examples in their experiment. In their experiment, the authors found test suites providing both requirements coverage and model coverage to be more effective at finding faults than test suites providing model coverage alone, at 5 % statistical significance.

What have the authors stated for future works in "Requirements coverage as an adequacy measure for conformance testing" ?

The authors hope to investigate this further in their future work. In their future work, the authors hope to define requirements coverage metrics that are more robust to the structure of the requirements. Test suites providing requirements coverage may be ineffective even with an excellent set of requirements.

How did the authors create the implementations that the authors used as a basis for the generation of large?

Using the Simulink models, the authors created implementations that the authors used as the basis for the generation of large sets of mutants by randomly seeding faults.

What is the role of the requirements coverage metric in the effectiveness of the generated test suites?

the rigor and robustness (with respect to requirements structure) of the requirements coverage metric used plays an important role in the effectiveness of the generated test suites.

What is the way to measure adequacy of conformance test suites?

the authors found that requirements coverage is useful when used in combination with model coverage to measure adequacy of conformance test suites.

How did the authors assess the effectiveness of the different test suites?

The authors assessed the effectiveness of the different test suites by measuring their fault finding capability, i.e., running them over the sets of mutants and measuring the number of faults detected.

How do the UFC suites achieve MC/DC?

On the DWM 1, Vertmax Batch, and Latctl Batch systems the UFC suites do reasonably well, achieving an average MC/DC of 78.2%, 88.6%, and 80.9% respectively as compared to 92.5%, 98% and 99.8% achievable MC/DC.

What is the procedure for calculating the reference distribution of the observations?

When performing a permutation test, a reference distribution is obtained by calculating all possible permutations of the observations [6, 11].

Why is the improvement in the DWM 2 system a result of the combined suites?

Under such circumstances, the fault finding improvement observed on combining the test suites would be solely due to the increased number of test cases.

Why did the authors generate multiple mutant sets for each example?

The authors generated multiple mutant sets for each example to reduce potential bias in their results from a mutant set that may have very hard (or easy) faults to detect.

(Open Access) Requirements Coverage as an Adequacy Measure for Conformance Testing (2008) | Ajitha Rajan

Requirements Coverage as an Adequacy

Measure for Conformance Testing

⋆

Ajitha Rajan

, Michael Whalen

, Matt Staats

, and Mats P.E. Heimdahl

University of Minnesota

Ro ckwell Collins Inc.

arajan@cs.umn.edu

mwwhalen@rockwellcollins.com

staats@cs.umn.edu

heimdahl@cs.umn.edu

Abstract. Conformance testing in model-based development refers to

the testing activity that veriﬁes whether the code generated (manually or

automatically) from the model is behaviorally equivalent to the model.

Presently the adequacy of conformance testing is inferred by measuring

structural coverage achieved over the model. We hypothesize that ade-

quacy metrics for conformance testing should consider structural coverage

over the requirements either in place of or in addition to structural cover-

age over the model. Measuring structural coverage over the requirements

gives a notion of how well the conformance tests exercise the required

b ehavior of the system.

We conducted an experiment to investigate the hypothesis stating struc-

tural coverage over formal requirements is more eﬀective than structural

coverage over the mo del as an adequacy measure for conformance testing.

We found that the hypothesis was rejected at 5% statistical signiﬁcance

on three of the four case examples in our exp eriment. Nevertheless, we

found that the tests providing requirements coverage found several faults

that remained undetected by tests providing model coverage. We thus

formed a second hypothesis stating that complementing model cover-

age with requirements coverage will prove more eﬀective as an adequacy

measure than solely using model coverage for conformance testing. In our

exp eriment, we found test suites providing both requirements coverage

and model coverage to be more eﬀective at ﬁnding faults than test suites

providing model coverage alone, at 5% statistical signiﬁcance. Based on

our results, we believe existing adequacy measures for conformance test-

ing that only consider model coverage can be strengthened by combining

them with rigorous requirements coverage metrics.

⋆

This work has been partially supported by NASA Ames Research Center Cooperative

Agreement NNA06CB21A, NASA IV&V Facility Contract NNG-05CB16C, and the

L-3 Titan Group.

1 Introduction

In critical avionics applications, the validation and veriﬁcation phase (V&V) is

particularly costly and consumes a disproportionably large share of the devel-

opment resources. Thus, if the process of deriving test cases for V&V can be

automated to provide test suites that satisfy the most stringent standards (such

as DO-178B in civil avionics [20]), dramatic time and cost savings can be re-

alized. The current trend towards model-based development is one attempt to

address this problem. In model-based software development, the traditional test-

ing process is split into two distinct activities: one activity that tests the model

to validate that it accurately captures the customers’ high-level requirements,

and another testing activity that veriﬁes whether the code generated (manually

or automatically) from the model is behaviorally equivalent to (or conforms to)

the model. (Note that by “model”, we are referring speciﬁcally to a high level

formal model written in a language such as Simulink or Lustre. Throughout

this paper, we refer to this simply as a “model”.) In this paper, we focus on

the second testing activity—veriﬁcation through conformance testing. There are

currently several tools, such as model checkers, that provide the capability to

automatically generate conformance tests [19, 7] from formal models. In this pa-

per, we examine the eﬀectiveness of metrics used in measuring the adequacy of

the generated conformance tests.

For critical avionics software, DO-178B necessitates test cases used in veriﬁ-

cation to achieve requirements coverage in addition to structural coverage over

the code. However, there is no direct and objective measure of requirements

coverage, and adequacy of tests is instead inferred by examining structural cov-

erage achieved over the model. The Modiﬁed Condition and Decision Coverage

(MC/DC) used when testing highly critical software [20] in the avionics industry

has been a natural choice to measure structural coverage for the most critical

models. In our work [21], however, we have deﬁned coverage metrics that pro-

vide direct and objective measures of how well a test suite exercises a set of

high-level formal software requirements. We examined using requirements cov-

erage metrics, in particular the Unique First Cause (UFC) coverage metric, to

measure adequacy of tests used in model validation (or black-box testing) and

found them to be useful. To save time and eﬀort, we would like to re-use val-

idation tests providing requirements coverage for veriﬁcation of code through

conformance testing as well. This paper examines the suitability of using tests

providing requirements UFC coverage for conformance testing as opposed to

tests providing MC/DC over the model.

We believe requirements coverage will b e useful as an adequacy measure for

conformance testing for several reasons. First, measuring structural coverage

over the requirements gives a direct assessment of how well the conformance

tests exercise the required behavior of the system. Second, if a model is miss-

ing functionality, measuring structural coverage over the model will not expose

such defects of omission. Third, obligations for requirements coverage describe

satisfying scenarios (paths) in the model as opposed to satisfying states deﬁned

by common model coverage obligations (such as MC/DC). We believe coverage

obligations that deﬁne satisfying paths will necessitate longer and more eﬀective

test cases than those deﬁning satisfying states in the model. Finally, we found

in [16] that structural coverage metrics over the model, in particular MC/DC, are

sensitive to the structure of the model used in coverage measurement. Therefore,

these metrics can be easily rendered ineﬃcient by (purposely or inadvertently)

restructuring the model to make it easier to achieve the desired coverage.

For these reasons, we believe that requirements coverage will serve as a

stronger adequacy measure than model coverage in measuring adequacy of con-

formance test suites. More speciﬁcally, we investigate the following hypothesis

in this paper:

Hypothesis 1 (H

): Conformance tests providing requirements UFC coverage

are more eﬀective at fault ﬁnding than conformance tests providing MC/DC

over the model.

We evaluated this hypothesis on four industrial examples from the civil avion-

ics domain. The requirements for these systems are formalized as Linear Tem-

poral Logic (LTL) [5] properties. The systems were modeled in the Simulink

notation [12]. Using the Simulink models, we created implementations that we

used as the basis for the generation of large sets of mutants by randomly seed-

ing faults. We generate numerous test suites to provide 100% achievable UFC

coverage over the LTL properties (the formal requirements), and numerous test

suites to provide 100% achievable MC/DC over the model. We assessed the ef-

fectiveness of the diﬀerent test suites by measuring their fault ﬁnding capability,

i.e., running them over the sets of mutants and measuring the number of faults

detected.

In our experiment we found that Hypothesis 1 was rejected on three of the

four examples at the 5% statistical signiﬁcance level. This result was somewhat

disappointing since we believed that the requirements coverage would be eﬀec-

tive as a conformance testing measure. The astute reader might point out that

the result might not be surprising since the eﬀectiveness of the requirements-

based tests providing UFC coverage heavily depends on the ‘goodness’ of the

requirements set; in other words, a poor set of requirements leads to poor tests.

In this case, however, we worked with case examples with very good sets of re-

quirements and we had exp ected better results. Nevertheless, we found that the

tests providing requirements UFC coverage found several faults that remained

undetected by tests providing MC/DC over the model. We thus formed a second

hypothesis stating that complementing model coverage with requirements cov-

erage will prove more eﬀective as an adequacy measure than solely using model

coverage for conformance testing. To investigate this, we formulated and tested

the following hypothesis:

Hypothesis 2 (H

): Conformance tests providing requirements UFC coverage

in addition to MC/DC over the model are more eﬀective at fault ﬁnding than

conformance tests providing only MC/DC over the model.

In our second set of experiments, the combined test suites were signiﬁcantly

more eﬀective than MC/DC test suites on three of the four case examples (at the

5% statistical signiﬁcance level). For these examples, UFC suites found several

faults not revealed by the MC/DC suites making the combination of UFC and

MC/DC more eﬀective than MC/DC alone. The relative improvement was in

the range of 4.3% − 10.8% on these examples. We strongly believe that for

the case example that did not support Hypothesis 2, the MC/DC suite found

all possible faults, making improvement with the combined suites impossible.

Based on our results, we believe that existing adequacy measures for conformance

testing based solely on structural coverage over the model (such as MC/DC) can

be strengthened by combining them with requirements coverage metrics such as

UFC. It is worth noting that Briand et.al. found similar results in their study [3],

though in the context of state-based testing for complex component models in

object-oriented software. Combining a state-based testing technique for classes

or class clusters modeled with statecharts [8], with a black-box testing technique,

category partition testing, proved signiﬁcantly more eﬀective in fault detection.

We recommend future measures of conformance testing adequacy to consider

both requirements and model coverage either by combining existing metrics,

such as MC/DC and UFC, or by deﬁning new metrics that account for both.

The remainder of the paper is organized as follows. Section 2 introduces our

experimental setup and the case examples used in our investigation. Results and

statistical analysis are presented in Section 3. Finally in Sections 4 and 5, we an-

alyze and discuss the implications of our results, and p oint to future directions.

2 Experiment

We use four industrial systems in our experiment: two models from a display

window manager for an air-transport class aircraft (DWM 1, DWM 2), and two

models representing ﬂight guidance mode logic for business and regional jet

class aircrafts (Vertmax Batch and Latctl Batch). All four systems were viewed

to have good sets of requirements as judged by the developer of the system. We

conducted the experiments for each case example using the steps outlined below

(elaborated in later sections):

1. Generate and reduce test suites to provide requirements UFC cov-

erage: We generated a test suite to provide UFC coverage over the formalized

LTL requirements. This test suite was na¨ıvely generated, one test case for every

UFC obligation, and thus highly redundant. We reduced the test suite randomly

while maintaining UFC coverage over the requirements. We generated three such

randomly reduced test suites.

2. Generate and reduce test suites to provide MC/DC over the model:

We na¨ıvely generated a test suite to provide MC/DC over the model. We then

randomly reduced the test suite to maintain MC/DC over the model. We gen-

erated three such reduced test suites.

3. Combined test suites that provide MC/DC + requirements UFC:

Among the reduced MC/DC suites from the previous step, we selected the most

eﬀective MC/DC test suite based on their fault ﬁnding ability. We merge this

test suite with each of the reduced UFC test suites from the ﬁrst step. The

combined suites thus provide both MC/DC over the model and UFC coverage

over the requirements.

4. Generate mutants: We randomly seeded faults in the correct implemen-

tation and generated three sets of 200 mutants using the method outlined in

Section 2.3.

5. Assess and compare fault ﬁnding: We run each of the test suites from

steps 1, 2 and 3 (that provide requirements UFC coverage, MC/DC over the

model, and MC/DC + requirements UFC coverage respectively) against each

set of mutants and the model. Note that the model serves as the oracle imple-

mentation in conformance testing. We say that a mutant is killed (or detected)

by a test suite if any of the test cases in the suite results in diﬀerent output

values between the model and the mutant. We recorded the number of mutants

killed by each test suite and computed the fault ﬁnding ability as the percentage

of mutants killed to the total number of mutants seeded.

2.1 Case Examples

In our experiment, we use four industrial systems. All four systems were modeled

using the Simulink notation from Mathworks Inc.

Display Window Manager Models (DWM 1 and DWM 2): The Display

Window Manager models, DWM 1, and DWM 2, represent 2 of the 5 major

subsystems of the Display Window Manager (DWM) of an air transport-level

commercial displays system. The DWM acts as a ‘switchb oard’ for the system

and has several responsibilities related to routing information to the displays

and manages the location of two cursors that can be used to control applications

by the pilot and copilot.

Flight Guidance System: A Flight Guidance System is a component of

the overall Flight Control System (FCS) in a commercial aircraft. It compares

the measured state of an aircraft (position, speed, and altitude) to the desired

state and generates pitch and roll-guidance commands to minimize the diﬀerence

between the measured and desired state. The FGS consists of the mode logic,

which determines which lateral and vertical modes of operation are active and

armed at any given time, and the ﬂight control laws that accept information

about the aircraft’s current and desired state and compute the pitch and roll

guidance commands. The two FGS models in this paper focus on the mode logic

of the FGS. The Vertmax Batch and Latctl Batch models describe the vertical

and lateral mode logic for the ﬂight guidance system.

2.2 Test Suite Generation and Reduction

We generated test suites to provide UFC coverage over formal LTL requirements

and to provide MC/DC over the model. The approach to generate and reduce

Requirements Coverage as an Adequacy Measure for Conformance Testing

Figures

Citations

Dafny: an automatic program verifier for functional correctness

Formalising java's data race free guarantee

Comparing the Expressiveness of Timed Automata and Timed Extensions of Petri Nets

The Why/Krakatoa/Caduceus platform for deductive program verification

Mutation Testing Advances: An Analysis and Survey

References

Model checking

The Design of Experiments

Applied nonparametric statistics

Applied nonparametric statistics

Is mutation an appropriate tool for testing experiments

Related Papers (5)

A Temporal Logic of Nested Calls and Returns

Communicating Sequential Processes

Programming Languages and Systems

Model checking

The Theory and Practice of Concurrency

Frequently Asked Questions (11)

Q1. What are the contributions in "Requirements coverage as an adequacy measure for conformance testing" ?

Q2. What have the authors stated for future works in "Requirements coverage as an adequacy measure for conformance testing" ?

Q3. How did the authors create the implementations that the authors used as a basis for the generation of large?

Q4. What is the role of the requirements coverage metric in the effectiveness of the generated test suites?

Q5. What is the way to measure adequacy of conformance test suites?

Q6. What are the current tools for generating conformance tests?

Q7. How did the authors assess the effectiveness of the different test suites?

Q8. How do the UFC suites achieve MC/DC?

Q9. What is the procedure for calculating the reference distribution of the observations?

Q10. Why is the improvement in the DWM 2 system a result of the combined suites?

Q11. Why did the authors generate multiple mutant sets for each example?