Book Chapter•DOI•

Are We There Yet? Determining the Adequacy of Formalized Requirements and Test Suites

Anitha Murugesan¹, Michael W. Whalen¹, Neha Rungta², Oksana Tkachuk², Suzette Person³, Mats P. E. Heimdahl¹, Dongjiang You¹ - Show less +3 more•Institutions (3)

University of Minnesota¹, Ames Research Center², Langley Research Center³

27 Apr 2015-pp 279-294

TL;DR: The results of the preliminary study show that even for systems with comprehensive test suites and good sets of requirements, the approach can identify cases where more tests or more requirements are needed to improve coverage numbers.

read less

Abstract: Structural coverage metrics have traditionally categorized code as either covered or uncovered. Recent work presents a stronger notion of coverage, checked coverage, which counts only statements whose execution contributes to an outcome checked by an oracle. While this notion of coverage addresses the adequacy of the oracle, for Model-Based Development of safety critical systems, it is still not enough; we are also interested in how much of the oracle is covered, and whether the values of program variables are masked when the oracle is evaluated. Such information can help system engineers identify missing requirements as well as missing test cases. In this work, we combine results from checked coverage with results from requirements coverage to help provide insight to engineers as to whether the requirements or the test suite need to be improved. We implement a dynamic backward slicing technique and evaluate it on several systems developed in Simulink. The results of our preliminary study show that even for systems with comprehensive test suites and good sets of requirements, our approach can identify cases where more tests or more requirements are needed to improve coverage numbers.

...read moreread less

Summary (4 min read)

Jump to: [1 Introduction] – [2 Motivation] – [3 Methodology] – [3.1 Coverage of Requirements] – [3.2 A More Precise Dynamic Backward Slice] – [3.3 Mapping Back to the Model] – [4.1 Case Examples] – [4.2 Tools and Experiment Set up] – [4.3 Analysis of the Results] – [5 Discussion] – [6 Related Work] and [7 Conclusion]

1 Introduction

Model-Based Development (MBD) refers to the use of domain-specific modeling notations to create models of a desired system early in the development lifecycle.
Model-Based Development significantly reduces costs while also improving quality.
They note that this metric judges the quality of the test oracle — a program with no assertions will have no coverage.
The authors hypothesis is that this metric can be leveraged to better assess the quality of an automated testing process in MBD where formalized requirements serve as oracles for auto-generated tests [28].
The authors combine the results of checked coverage with the results of requirements coverage to determine for a given model whether its requirements and test suite are adequate.

2 Motivation

Consider the control software for an infusion pump, a medical device that is typically used to infuse liquid drugs into a patient’s body in a controlled fashion.
The “ALARM” subsystem is responsible for monitoring hazards (CheckAlarm state machine) with different levels of severity in the system, and alerting the clinicians (Audio and Visual state machines) to take the appropriate action when such conditions occur.
The authors auto-generate the source code from the Simulink model, formalize the requirements as boolean expressions, and automatically generate the test cases from the model.
To motivate the utility of their proposed approach the authors use a snippet of autogenerated code from the Audio state machine in Fig.
This example demonstrates that the checked coverage is lower than the set of covered statements.

3 Methodology

There are three inputs to their technique: the model of the system being analyzed, a set of test cases (manual or auto-generated) that exercise the model, and a set of formalized requirements of the model as shown in Fig.
A dynamic backward slice is used to extract the set of program statements that operate on variables whose values are checked in the assertions.
This is termed as checked coverage while all other executed statements are categorized as unchecked coverage.
The algorithm takes as input an auto-generated program M , the test suite T for exercising the behaviors of the program, and the set of assertions that encode the formalized requirements.
Dynamic slicing is used to compute the basic form of checked coverage.

3.1 Coverage of Requirements

In this work the authors use the Modified Decision/Condition Coverage (MC/DC) metric to evaluate the assertion coverage for a given test suite.
MC/DC coverage of a requirement encoded as an assertion requires that each condition in the assertion takes on all possible outcomes at least once and each condition is shown to independently affect the assertion’s outcome.
The authors use the masking form of MC/DC to determine the independence of the conditions in the assertion.
A condition is masked if changing its value does not affect the outcome of the assertion.
But if only one is satisfied by the test, then the authors report 33% coverage of the assertion.

3.2 A More Precise Dynamic Backward Slice

The authors propose a more precise dynamic backward slice that takes into account which parts of the assertion are covered and whether certain values of program variables are not used when the assertion is evaluated.
The authors leverage the masking information within an assertion for a given test to generate a more precise dynamic backward slice.
Then get all of the program statements in the execution trace that impact them.the authors.
Even though there are values of y being written to in the execution trace, since they are not being used in the evaluation of the assertion, they are not added to the checked set.
The authors believe this will reduce the size of the checked set and provide a more precise characterization of parts of the program that are being checked in the assertions.

3.3 Mapping Back to the Model

In the final phase of their technique, for a given test suite, the authors report the following to the system engineers: (i) the precise checked coverage, (ii) the unchecked coverage, (iii) the uncovered coverage, and the (iv) coverage of the requirements.
Note that the authors map the coverage of the code onto the model.
The authors believe that these coverage measures help us bridge the gap between requirements, tests, and the model as discussed in [28].
The relationship between the various types of coverage can potentially help to determine the source of incompleteness in either tests, requirements, or the model.
Low coverage of the requirements coupled with low checked coverage could be indicative of missing tests and/or missing requirements.

4.1 Case Examples

The authors consider three different systems: a medical device controller, an avionics system controller and a general appliance controller.
The second column gives the number of auto-generated source lines of code (LOC); column three presents the number of requirements available for each test suite; column 4 describes the source of the test suites.
This system was also developed using Mathworks Simulink/Stateflow tool and its source code was generated using Simulink Coder.
Their goal was to analyze the adequacy of the sparse requirements for the test cases.the authors.
The authors generated code for the microwave system using the Gryphon Tool Suite [34].

4.2 Tools and Experiment Set up

The authors use a combination of commercially available and free open source tools to implement their approach.
As previously mentioned, the test suites and the source code are generated using various sources and tools in order to generate a variety of artifacts and determine the efficacy of the different test suites based on their metrics.
The total number of obligations that are satisfied by the test suite are recorded and reported.
The Frama-C slicing plugin requires the slicing criterion to be expressed using ACSL [4], a formal specification language used for specifying behavioral properties of C source code.
Once all slices and execution traces are obtained, the slices are compared with the execution trace to identify the checked and unchecked covered lines of code.

4.3 Analysis of the Results

Table 2 shows the structural and requirements coverage metrics for the artifacts for a given test suite.
Similarly MCR 2 has statement and condition coverage of 87% and 100% respectively and requirements coverage of 80%.
The results demonstrate that, overall, the checked coverage in Table 3 is lower compared to the set of covered statements shown in Table 2.
Using themore precise dynamic slicing techniqueproposed in thiswork the checked coverage decreases even further while the unchecked coverage increases.

5 Discussion

The authors summarize the results of the empirical evaluation and provide some recommendations for improvement based on the data.
This is not surprising since there are only three requirements for the model.
The ALM 2, DCK 2, MCR 2 examples have reasonable statement and requirements coverage but low precise checked coverage.
The variables used in these lines are then traced back to their source blocks in the model, as shown in Figure 5.
Using this information, a system engineer might want to add a requirement that would check if the system has been IDLE for more than a certain amount of time.

7 Conclusion

The two main techniques for test case generation are (i) manual and (ii) automated test case generation techniques.
Sometimes even in manually generated tests, defining a precise oracle for a given test is often a difficult endeavor.
Recent work presents a stronger notion of coverage of checked coverage, compared to traditional structural values of simply covered and uncovered [29,30].
The approach presented here allows us to connect the dots between test cases, requirements, and the model.

Did you find this useful? Give us your feedback

Figures (9)

Fig. 4. An algorithm to partition checked and unchecked coverage

Table 3. Coverage Metrics Partitioned based on Slicing

Fig. 1. Hierarchical state machine model of the ALARM subsystem

Table 1. Case Example Artifacts Synopsis

Table 2. Case Example’s Test Case Coverage Metrics

Fig. 2. Code snippet from the ALARM system’s audio notification functionality

Fig. 5. Tracing unchecked lines of source code in the ALARM model

Fig. 3. Test Case Coverage Classification Approach Overview

Content maybe subject to copyright Report

Are We There Yet? Determining the Adequacy

of Formalized Requirements and Test Suites

Anitha Murugesan

)

, Michael W. Whalen

, Neha Rungta

Oksana Tkachuk

, Suzette Person

, Mats P.E. Heimdahl

, and Dongjiang You

Department of Computer Science and Engineering, University of Minnesota,

200 Union Street, Minneapolis, MN 55455, USA

{anitha,whalen,heimdahl,djyou}@cs.umn.edu

NASA Ames Research Center, Mountain, USA

{neha.s.rungta,oksana.tkachuk}@nasa.gov

NASA Langley Research Center, Hampton, USA

suzette.person@nasa.gov

Abstract. Structural coverage metrics have traditionally categorized

code as either covered or uncovered. Recent work presents a stronger

notion of coverage, checked coverage, which counts only statements whose

execution contributes to an outcome checked by an oracle. While this

notion of coverage addresses the adequacy of the oracle, for Model-Based

Development of safety critical systems, it is still not enough; we are also

interested in how much of the oracle is covered, and whether the val-

ues of program variables are masked when the oracle is evaluated. Such

information can help system engineers identify missing requirements as

well as missing test cases. In this work, we combine results from checked

coverage with results from requirements coverage to help provide insight

to engineers as to whether the requirements or the test suite need to

be improved. We implement a dynamic backward slicing technique and

evaluate it on several systems developed in Simulink. The results of our

preliminary study show that even for systems with comprehensive test

suites and good sets of requirements, our approach can identify cases

where more tests or more requirements are needed to improve coverage

numbers.

1 Introduction

Model-Based Development (MBD) refers to the use of domain-speciﬁc modeling

notations to create models of a desired system early in the development lifecycle.

These models can be executed on the desktop, analyzed for desired behaviors,

and then used to automatically generate code and test cases. Also known as

correct-by-construction development, the emphasis in model-based development

is on the engineering eﬀort invested in the early lifecycle activities of modeling,

simulation, and analysis. This reduces development costs by ﬁnding defects early

This work has been partially supported by NSF grants CNS-0931931 and CNS-

1035715.

 Springer International Publishing Switzerland 2015

K. Havelund et al. (Eds.): NFM 2015, LNCS 9058, pp. 279–294, 2015.

DOI: 10.1007/978-3-319-17524-9

280 A. Murugesan et al.

in the lifecycle, avoiding rework that is necessary when errors are discovered dur-

ing integration testing, and by automating the late life-cycle activities of coding

and test case generation. In this way, Model-Based Development signiﬁcantly

reduces costs while also improving quality. There are several commercial MBD

tools, including Simulink/Stateﬂow [19], SCADE [10], IBM Rhapsody [1]and

IBM Rational Statemate [2].

An important part of MBD is automated test generation and execution.

Tools such as Reactis [26], the MathWorks Veriﬁcation and Validation plug-in

for Simulink, and the IBM Rhapsody Automatic Test Generation add-on, as

well as other tools, support automated test generation from models. These tools

enable generation of structural coverage tests up to a high degree of rigor, e.g.,

tests satisfying the MC/DC coverage metric. In the domain of critical systems

– particularly in avionics – demonstrating structural coverage is required for

certiﬁcation [27].

In principle, automated test generation represents a success for software engi-

neering research: a mandatory – and potentially arduous – engineering task

has been automated. However, several studies have raised questions about the

eﬀectiveness of automated test generation towards a speciﬁc structural coverage

metric (e.g., [12,14,31]), in some cases ﬁnding these tests less eﬀective than ran-

domly generated tests of the same length in terms of fault-ﬁnding capabilities.

This often has to do with the observability capabilities of the test oracle, which

determines whether the test passes or fails. In many cases, the code structure

that was examined has no measurable eﬀect on the test outcome.

In recent work, a metric proposed by Schuler and Zeller in [29,30] addresses

observability, but does so in a post-priori way: given a test suite and a set

of requirements speciﬁed as assertions, it uses dynamic backward slicing from

the requirements (assertions) to determine the set of program statements that

aﬀect the evaluation of the requirement. They call this metric checked statement

coverage, because it only considers the statements that are checked (observed).

They note that this metric judges the quality of the test oracle — a program

with no assertions will have no coverage. Therefore, given any test suite, it is

possible to increase coverage by adding additional oracles (requirements) to the

suite. Our hypothesis is that this metric can be leveraged to better assess the

quality of an automated testing process in MBD where formalized requirements

serve as oracles for auto-generated tests [28].

In this work, we combine the results of checked coverage with the results of

requirements coverage to determine for a given model whether its requirements

and test suite are adequate. While the work in [30] focuses on whether or not

the oracles (requirements) are adequate, we are interested in both the adequacy

of the test suite and the requirements encoded as oracles: if checked coverage is

low then either the requirements or the tests maybe incomplete. Speciﬁcally, we

add to this notion of coverage by calculating checked coverage based on dynamic

backward slicing as well as MC/DC masking information. Finally, we map the

diﬀerent forms of code coverage back to the model, and report the coverage of

Determining the Adequacy of Formalized Requirements and Test Suites 281

Fig. 1. Hierarchical state machine model of the ALARM subsystem

the requirements, in order to provide information to the system engineers about

sources of incompleteness. Thus, the contributions of the paper are:

– An approach using checked, unchecked, and requirements coverage informa-

tion to assess the adequacy of both test suites and requirements.

– An approach to calculate checked coverage based on backward dynamic slic-

ing and MC/DC masking information, which leads to more precise checked

coverage results than dynamic backward slicing alone.

– A preliminary evaluation of our technique on a set of examples that use

Simulink as part of the MBD approach. In addition to computing coverage

for the auto-generated code, we also map the results back to the models.

Our experience shows that even for case studies with comprehensive test

suites and good sets of requirements, our approach can identify cases where

more tests or more requirements are needed to improve the coverage numbers.

2 Motivation

Consider the control software for an infusion pump, a medical device that is typ-

ically used to infuse liquid drugs into a patient’s body in a controlled fashion. An

important subsystem of the controller is the ALARM subsystem shown in Fig. 1.

The model for the system [22] was developed using MathWorks Simulink/State-

ﬂow tool [19]. The “ALARM” subsystem is responsible for monitoring hazards

(CheckAlarm state machine) with diﬀerent levels of severity in the system, and

alerting the clinicians (Audio and Visual state machines) to take the appropriate

action when such conditions occur. We auto-generate the source code from the

Simulink model, formalize the requirements as boolean expressions, and auto-

matically generate the test cases from the model.

282 A. Murugesan et al.

1: if(localB->ALARM

OUT Hazard >=3){

2: if(localB->Disable

Audio > 1){

3: localB->ALARM

OUT Audio Command =0;

4: localB->ALARM

OUT Audio Disabled =1;

5: if(localDW->time

minutes > 3){

6: localB->Disable

Audio =0;

7: }

8: }

9: }else ...

Fig. 2. Code snippet from the ALARM system’s audio notiﬁcation functionality

To motivate the utility of our proposed approach we use a snippet of auto-

generated code from the Audio state machine in Fig. 1. The code is shown

in Fig. 2. It raises an aural alert when a certain level of hazard is detected and

the audio has not been disabled by the user. Assume the following oracle encodes

a requirement of the system:

Hazard >=3∧ Disable

Audio =0 =⇒ Audio Command =1

Suppose we execute a test case, t, that covers program statements one to

seveninFig.2 and the values of the variables used in the oracle are: Hazard := 3

and Disable

Audio := 2. The corresponding checked coverage for the test does

not contain the program statement at line 4 in Fig. 2;theAudio

Disabled variable

deﬁned at line 4 does not either directly or transitively impact the values used in

the oracle. This example demonstrates that the checked coverage is lower than

the set of covered statements.

The notion of checked coverage, however, does not take into account which

parts of the oracle were covered and whether the values of certain program

variables are masked when the oracle is evaluated. The values for variables

Hazard := 3 and Disable

Audio := 2 cause the antecedent in the requirement

(Hazard >=3∧ Disable

Audio = 0) to be false; hence, the consequent of the

requirement (Audio

Command = 1) is not evaluated. Even though the program

statement at line 3 in Fig. 2 writes to the variable Audio

Command used in

the oracle, the test, t, does not evaluate Audio

Command in the oracle. We

can leverage this information to deﬁne a more precise checked coverage measure

by marking line 3 in Fig. 2 as unchecked. In the next section we present an

overview of how we measure requirements coverage along with checked coverage

to improve upon the checked coverage measure.

3 Methodology

There are three inputs to our technique: the model of the system being analyzed,

a set of test cases (manual or auto-generated) that exercise the model, and a set

of formalized requirements of the model as shown in Fig. 3. The requirements are

Determining the Adequacy of Formalized Requirements and Test Suites 283

Fig. 3. Test Case Coverage Classiﬁcation Approach Overview

transformed into assertions over program variables. We automatically generate

the code from the model and execute the tests on the auto-generated code. The

formalized requirements are used as a slicing criteria for program execution traces

generated by the various tests as shown in Fig. 3. A dynamic backward slice is

used to extract the set of program statements that operate on variables whose

values are checked in the assertions. This is termed as checked coverage while all

other executed statements are categorized as unchecked coverage. In addition to

the code coverage we also measure the coverage of the requirements. Checked,

unchecked, and uncovered code coverage are mapped back to the model to help

the system engineers determine incompleteness in the requirements, tests, or the

model.

We present an overview of the algorithm to partition coverage into checked

coverage versus unchecked coverage in Fig. 4. The algorithm takes as input an

auto-generated program M, the test suite T for exercising the behaviors of the

program, and the set of assertions that encode the formalized requirements. The

sets checked and unchecked are initialized as empty. We run each test, t,in

the test suite T on the program and generate the set of program statements

l

,...,l

 executed by the test. Next, we generate a dynamic slice of the trace

using each assertion a as the slicing criteria. In the case that a program statement

l is in the dynamic slice then it is added to the checked set; otherwise it is added

to the unchecked set.

Dynamic slicing is used to compute the basic form of checked coverage. A

dynamic slice of an execution trace with respect to an assertion extracts the set of

program statements in the trace that may impact the evaluation of the assertion.

HTML Viewer

Frequently Asked Questions (12)

Q1. What is the observability-based code coverage metric?

The observability-based code coverage metric (OCCOM) attaches tags to internal states in a circuit and the propagation of tags is used to predict the actual propagation of errors (corrupted state) [9,11].

Q2. What is the purpose of the observability coverage metric?

The observability coverage can be used to determine whether erroneous effects that are activated by the inputs can be observed at the outputs.

Q3. What was the source code of the ALARM subsystem?

The model of the ALARM subsystem was developed as a multi-level hierarchical state machine using the Mathworks Simulink/Stateflow tool.

Q4. What is the recommendation for the test suite?

Their recommendation is to first augment the test suite with tests that exercise additional parts of the code, then try to identify missing requirements, and finally measure the requirements coverage with the augmented test cases.

Q5. What is the definition of a dynamic slice?

Any program statements that read or write variables used in the assertion, as well as program statements computed by transitive closure of the reads and writes, are part of the dynamic slice.

Q6. What is the third case example of a microwave controller?

The third case example is a microwave’s controller system used in their previous work [28], that was also modeled as hierarchical state machines using the MathWorks Stateflow notation.

Q7. What is the metric used to determine the evaluation of a test suite?

In recent work, a metric proposed by Schuler and Zeller in [29,30] addresses observability, but does so in a post-priori way: given a test suite and a set of requirements specified as assertions, it uses dynamic backward slicing from the requirements (assertions) to determine the set of program statements that affect the evaluation of the requirement.

Q8. What is the accurate technique for dynamic taint analysis?

More accurate techniques forinformation flow modeling, such as [35], define path conditions to prove noninterference, that is, the non-observability of a variable or expression on a particular output.

Q9. What are the three different systems that the authors consider?

The authors consider three different systems: a medical device controller, an avionics system controller and a general appliance controller.

Q10. What was the main purpose of the test suite for the docking example?

For the Docking example, the authors generated a random test suite using the Reactis tool and another test suite with high structural coverage using MathWorks Simulink Design Verifier (SDV) [21] .

Q11. What is the definition of observability testing?

For software, dynamic taint analysis, or dynamic information flow analysis, marks and tracks data in a program at runtime in order to determine observability.

Q12. What causes the consequent of the requirement to be false?

The values for variables Hazard := 3 and Disable Audio := 2 cause the antecedent in the requirement (Hazard >= 3 ∧ Disable Audio = 0) to be false; hence, the consequent of the requirement (Audio Command = 1) is not evaluated.

Are We There Yet? Determining the Adequacy of Formalized Requirements and Test Suites

Summary (4 min read)

1 Introduction

2 Motivation

3 Methodology

3.1 Coverage of Requirements

3.2 A More Precise Dynamic Backward Slice

3.3 Mapping Back to the Model

4.1 Case Examples

4.2 Tools and Experiment Set up

4.3 Analysis of the Results

5 Discussion

7 Conclusion

Figures (9)

Citations

Cites background from "Are We There Yet? Determining the A..."

Cites background from "Are We There Yet? Determining the A..."

Cites background from "Are We There Yet? Determining the A..."

Cites background from "Are We There Yet? Determining the A..."

References

Related Papers (5)

Frequently Asked Questions (12)

Q1. What is the observability-based code coverage metric?

Q2. What is the purpose of the observability coverage metric?

Q3. What was the source code of the ALARM subsystem?

Q4. What is the recommendation for the test suite?

Q5. What is the definition of a dynamic slice?

Q6. What is the third case example of a microwave controller?

Q7. What is the metric used to determine the evaluation of a test suite?

Q8. What is the accurate technique for dynamic taint analysis?

Q9. What are the three different systems that the authors consider?

Q10. What was the main purpose of the test suite for the docking example?

Q11. What is the definition of observability testing?

Q12. What causes the consequent of the requirement to be false?