Are We There Yet? Determining the Adequacy of Formalized Requirements and Test Suites
Summary (4 min read)
1 Introduction
- Model-Based Development (MBD) refers to the use of domain-specific modeling notations to create models of a desired system early in the development lifecycle.
- Model-Based Development significantly reduces costs while also improving quality.
- They note that this metric judges the quality of the test oracle — a program with no assertions will have no coverage.
- The authors hypothesis is that this metric can be leveraged to better assess the quality of an automated testing process in MBD where formalized requirements serve as oracles for auto-generated tests [28].
- The authors combine the results of checked coverage with the results of requirements coverage to determine for a given model whether its requirements and test suite are adequate.
2 Motivation
- Consider the control software for an infusion pump, a medical device that is typically used to infuse liquid drugs into a patient’s body in a controlled fashion.
- The “ALARM” subsystem is responsible for monitoring hazards (CheckAlarm state machine) with different levels of severity in the system, and alerting the clinicians (Audio and Visual state machines) to take the appropriate action when such conditions occur.
- The authors auto-generate the source code from the Simulink model, formalize the requirements as boolean expressions, and automatically generate the test cases from the model.
- To motivate the utility of their proposed approach the authors use a snippet of autogenerated code from the Audio state machine in Fig.
- This example demonstrates that the checked coverage is lower than the set of covered statements.
3 Methodology
- There are three inputs to their technique: the model of the system being analyzed, a set of test cases (manual or auto-generated) that exercise the model, and a set of formalized requirements of the model as shown in Fig.
- A dynamic backward slice is used to extract the set of program statements that operate on variables whose values are checked in the assertions.
- This is termed as checked coverage while all other executed statements are categorized as unchecked coverage.
- The algorithm takes as input an auto-generated program M , the test suite T for exercising the behaviors of the program, and the set of assertions that encode the formalized requirements.
- Dynamic slicing is used to compute the basic form of checked coverage.
3.1 Coverage of Requirements
- In this work the authors use the Modified Decision/Condition Coverage (MC/DC) metric to evaluate the assertion coverage for a given test suite.
- MC/DC coverage of a requirement encoded as an assertion requires that each condition in the assertion takes on all possible outcomes at least once and each condition is shown to independently affect the assertion’s outcome.
- The authors use the masking form of MC/DC to determine the independence of the conditions in the assertion.
- A condition is masked if changing its value does not affect the outcome of the assertion.
- But if only one is satisfied by the test, then the authors report 33% coverage of the assertion.
3.2 A More Precise Dynamic Backward Slice
- The authors propose a more precise dynamic backward slice that takes into account which parts of the assertion are covered and whether certain values of program variables are not used when the assertion is evaluated.
- The authors leverage the masking information within an assertion for a given test to generate a more precise dynamic backward slice.
- Then get all of the program statements in the execution trace that impact them.the authors.
- Even though there are values of y being written to in the execution trace, since they are not being used in the evaluation of the assertion, they are not added to the checked set.
- The authors believe this will reduce the size of the checked set and provide a more precise characterization of parts of the program that are being checked in the assertions.
3.3 Mapping Back to the Model
- In the final phase of their technique, for a given test suite, the authors report the following to the system engineers: (i) the precise checked coverage, (ii) the unchecked coverage, (iii) the uncovered coverage, and the (iv) coverage of the requirements.
- Note that the authors map the coverage of the code onto the model.
- The authors believe that these coverage measures help us bridge the gap between requirements, tests, and the model as discussed in [28].
- The relationship between the various types of coverage can potentially help to determine the source of incompleteness in either tests, requirements, or the model.
- Low coverage of the requirements coupled with low checked coverage could be indicative of missing tests and/or missing requirements.
4.1 Case Examples
- The authors consider three different systems: a medical device controller, an avionics system controller and a general appliance controller.
- The second column gives the number of auto-generated source lines of code (LOC); column three presents the number of requirements available for each test suite; column 4 describes the source of the test suites.
- This system was also developed using Mathworks Simulink/Stateflow tool and its source code was generated using Simulink Coder.
- Their goal was to analyze the adequacy of the sparse requirements for the test cases.the authors.
- The authors generated code for the microwave system using the Gryphon Tool Suite [34].
4.2 Tools and Experiment Set up
- The authors use a combination of commercially available and free open source tools to implement their approach.
- As previously mentioned, the test suites and the source code are generated using various sources and tools in order to generate a variety of artifacts and determine the efficacy of the different test suites based on their metrics.
- The total number of obligations that are satisfied by the test suite are recorded and reported.
- The Frama-C slicing plugin requires the slicing criterion to be expressed using ACSL [4], a formal specification language used for specifying behavioral properties of C source code.
- Once all slices and execution traces are obtained, the slices are compared with the execution trace to identify the checked and unchecked covered lines of code.
4.3 Analysis of the Results
- Table 2 shows the structural and requirements coverage metrics for the artifacts for a given test suite.
- Similarly MCR 2 has statement and condition coverage of 87% and 100% respectively and requirements coverage of 80%.
- The results demonstrate that, overall, the checked coverage in Table 3 is lower compared to the set of covered statements shown in Table 2.
- Using themore precise dynamic slicing techniqueproposed in thiswork the checked coverage decreases even further while the unchecked coverage increases.
5 Discussion
- The authors summarize the results of the empirical evaluation and provide some recommendations for improvement based on the data.
- This is not surprising since there are only three requirements for the model.
- The ALM 2, DCK 2, MCR 2 examples have reasonable statement and requirements coverage but low precise checked coverage.
- The variables used in these lines are then traced back to their source blocks in the model, as shown in Figure 5.
- Using this information, a system engineer might want to add a requirement that would check if the system has been IDLE for more than a certain amount of time.
7 Conclusion
- The two main techniques for test case generation are (i) manual and (ii) automated test case generation techniques.
- Sometimes even in manually generated tests, defining a precise oracle for a given test is often a difficult endeavor.
- Recent work presents a stronger notion of coverage of checked coverage, compared to traditional structural values of simply covered and uncovered [29,30].
- The approach presented here allows us to connect the dots between test cases, requirements, and the model.
Did you find this useful? Give us your feedback
Citations
36 citations
22 citations
Cites background from "Are We There Yet? Determining the A..."
...Recent efforts of most interest have focused on measuring checked coverage [48], [49], [50], where a metric tries to make sure the code under test potentially changes the value of an assert, using dynamic slicing [51], [52]....
[...]
8 citations
Cites background from "Are We There Yet? Determining the A..."
...Recent work presents a stronger notion of coverage, checked coverage, which counts only statements whose execution contributes to an outcome checked by an oracle [75], [76]....
[...]
7 citations
Cites background from "Are We There Yet? Determining the A..."
...Recent efforts of most interest have focused on measuring checked coverage (Schuler and Zeller 2011, 2013; Murugesan et al. 2015), where a metric tries to make sure the code under test potentially changes the value of an assert, using dynamic slicing (Zhang et al. 2003; Tip 1995)....
[...]
...Recent efforts of most interest have focused on measuring checked coverage [82,83,77], where a metric tries to make sure the code under test potentially changes the value of an assert, using dynamic slicing [91,87]....
[...]
4 citations
Cites background from "Are We There Yet? Determining the A..."
...We would like to thank Mona Rahimi for discussions that led to the initial IVC idea, John Backes and Anitha Murugesan for discussions on various aspects of IVCs, and Lucas Wagner for his work integrating them into the Spear requirements tool....
[...]
...Recent work by Murugesan [93] and Schuller [94] attempts to utilize properties when determining test cover-...
[...]
...Recent work by Murugesan [93] and Schuller [94] attempts to utilize properties when determining test coverage towards ensuring adequate requirements....
[...]
References
[...]
3,163 citations
[...]
2,595 citations
2,047 citations
1,138 citations
1,124 citations
Related Papers (5)
Frequently Asked Questions (12)
Q2. What is the purpose of the observability coverage metric?
The observability coverage can be used to determine whether erroneous effects that are activated by the inputs can be observed at the outputs.
Q3. What was the source code of the ALARM subsystem?
The model of the ALARM subsystem was developed as a multi-level hierarchical state machine using the Mathworks Simulink/Stateflow tool.
Q4. What is the recommendation for the test suite?
Their recommendation is to first augment the test suite with tests that exercise additional parts of the code, then try to identify missing requirements, and finally measure the requirements coverage with the augmented test cases.
Q5. What is the definition of a dynamic slice?
Any program statements that read or write variables used in the assertion, as well as program statements computed by transitive closure of the reads and writes, are part of the dynamic slice.
Q6. What is the third case example of a microwave controller?
The third case example is a microwave’s controller system used in their previous work [28], that was also modeled as hierarchical state machines using the MathWorks Stateflow notation.
Q7. What is the metric used to determine the evaluation of a test suite?
In recent work, a metric proposed by Schuler and Zeller in [29,30] addresses observability, but does so in a post-priori way: given a test suite and a set of requirements specified as assertions, it uses dynamic backward slicing from the requirements (assertions) to determine the set of program statements that affect the evaluation of the requirement.
Q8. What is the accurate technique for dynamic taint analysis?
More accurate techniques forinformation flow modeling, such as [35], define path conditions to prove noninterference, that is, the non-observability of a variable or expression on a particular output.
Q9. What are the three different systems that the authors consider?
The authors consider three different systems: a medical device controller, an avionics system controller and a general appliance controller.
Q10. What was the main purpose of the test suite for the docking example?
For the Docking example, the authors generated a random test suite using the Reactis tool and another test suite with high structural coverage using MathWorks Simulink Design Verifier (SDV) [21] .
Q11. What is the definition of observability testing?
For software, dynamic taint analysis, or dynamic information flow analysis, marks and tracks data in a program at runtime in order to determine observability.
Q12. What causes the consequent of the requirement to be false?
The values for variables Hazard := 3 and Disable Audio := 2 cause the antecedent in the requirement (Hazard >= 3 ∧ Disable Audio = 0) to be false; hence, the consequent of the requirement (Audio Command = 1) is not evaluated.