What are the future works in "Towards systematic testing of distributed real-time systems" ?

Future pursuits include to experimentally validate the usefulness of the presented results, extend the testing methodology, devise testability increasing design rules for DRTS, employ the technique for exact calculations of response-times in fixed priority scheduled systems, and to investigate the benefits of using the testability measure as a new heuristics in the generation of highly testable static schedules.

What can be done to achieve reproducible testing?

If the authors also have the possibility to control the parameters: input, output, time and synchronization, not only observe them, the authors can enforce specific execution scenarios thereby achieving reproducible testing.

How can the authors control the execution times in other use-cases?

Control over the execution times in other use-cases can easily be achieved by incorporating delays in the jobs, or running dummies, as long as they stay within each job’s execution time range [BCET, WCET].

What is the only way to guarantee a shared memory system?

The only possibility to guarantee (1) in a shared memory system is to use a hardware memory protection scheme, or to by design eliminate shared resources.

What are the future pursuits of the authors?

Future pursuits include to experimentally validate the usefulness of the presented results, extend the testing methodology, devise testability increasing design rules for DRTS, employ the technique for exact calculations of response-times in fixed priority scheduled systems, and to investigate the benefits of using the testability measure as a new heuristics in the generation of highly testable static schedules.

What is the main benefit of the execution order analysis?

A positive side effect of the execution order analysis is that the authors get exact response-times for the jobs, even when interrupts afflict the system.

How many releases will be made for a task with periods less than 6 LCM(T?

For tasks with periods less than6 LCM(T), multiple releases will be made in the interval [0, Jmax], e.g., for a task i with 2Ti= LCM(T) there will be two corresponding jobs in their task model, released at ri and ri+Ti.

What is the effect of reducing the execution time jitter?

An interesting conclusion that can be drawn from these types of jitter, and their effect on the execution order graph, is that:1. Minimizing the execution time jitter minimizes the preemption and release intervals, with the positive effect of reducing the preemption “hit” window, and thus reducing the number of execution order scenarios.

What is the effect of a parallel composition of the individual EOGs?

In the case of perfectly synchronized clocks this essentially amounts to perform a parallel composition of the individual EOGs, using standard techniques for composing timed transition systems [Sifakis1996].

what is the execution order graph for a set of jobs?

The execution order graph for a set of jobs J is generated by a call Eog(ROOT, {}, 0, 0, [0, JMAX]), i.e., with a root node, an empty ready set, and the release-times a and b set to zero, plus the considered interval SI.// n- previous node, rdy- set of ready jobs, a to b – release interval, SI – the considered interval.

What is the objective measure of system testability?

The number of execution orderings is an objective measure of system testability, and can thus be used as a metric for comparing different designs, and schedules.

What is the purpose of the static periodic scheduling?

To see how static periodic scheduling can be mapped to their task model consider a static schedule for a set of periodic tasks T, where each task Ti∈T has the following attributes and relations to other tasks: • Period (Ti), release time (ri; start time relative period start), deadline (Di; latest completion time relative ri), worst case execution time (WCETi), best case execution time (BCETi), and priority (pi; each priority is unique).

What is the requirement for integration testing of distributed real-time systems?

In order to perform integration testing of distributed real-times systems the following is required:• A feasible global schedule, including probes.•

What is the definition of a testable execution order?

To facilitate reproducible testing the authors must identify which execution orderings, or parts of execution orderings that can be enforced without introducing any probe effect.

How many times can an interrupt be preempted?

The maximum value of w is defined by (4-3) and is based on the assumption that all the interrupts are ready for preemption exactly at the beginning of the interval – this maximizes the number of times the interrupts can preempt the interval.w = WCETA + k interruptsk WCET⋅ ∑ ∈ mink Tw (4-3)This equation can be solved using the standard RTA iteration technique [Joseph1986,Audsley1995].

(Open Access) Towards systematic testing of distributed real-time systems (1999) | Henrik Thane

Q: What are the contributions in "Towards systematic testing of distributed real-time systems" ?

In this paper the authors present a method for identifying all the possible orderings of task starts, preemptions and completions for tasks executing in distributed real-time systems. This allows test methods for sequential programs to be used, since the authors can regard each identified ordering as a sequential program.

Q: What is the main problem of the probe-effect in distributed realtime systems?

For a distributed realtime system the authors must allocate resources for the probes, including execution time, memory, communication bus bandwidth and account for the probes when scheduling [Thane1999].

TOWARDS SYSTEMATIC TESTING OF

DISTRIBUTED REAL-TIME SYSTEMS

Henrik Thane, Hans Hansson

Mälardalen Real-Time Research Centre

www.mrtc.mdh.se

Department of Computer Engineering

Mälardalen University

P.O. Box 883, S-721 23, Västerås, Sweden

Tel. +46 21 103157, Fax. +46 21 103110

henrik.thane@mdh.se

ABSTRACT

Reproducible and deterministic testing of sequential programs can in most cases be achieved by

controlling the sequence of inputs to the program. The behavior of a distributed real-time system, on the

other hand, not only depends on the inputs but also on the order and timing of the concurrent tasks that

execute and communicate with each other and the environment. Hence, sequential test techniques are

not directly applicable, since they disregard the significance of order and timing.

In this paper we present a method for identifying all the possible orderings of task starts, preemptions

and completions for tasks executing in distributed real-time systems. This allows test methods for

sequential programs to be used, since we can regard each identified ordering as a sequential program.

The number of identified execution orderings can be used as an objective measure of the testability of

the distributed real-time system. Such a measure is an important quality attribute, which can be utilized

as a new scheduling optimization criterion for generating static schedules of high testability.

Keywords: Testing, distributed real-time systems, determinism, reproducibility, testability.

1 INTRODUCTION

A real-time system is per definition correct if it performs the correct function at the correct time. Using

real-time scheduling theory we can provide guarantees that each task in the system will meet its timing

requirements [Liu1973,Audsley1995,Xu1990], providing that the basic assumptions hold during run-

time, e.g., task execution times and periodicity. However, scheduling theory does not give any

guarantees for the system’s functional behavior, i.e., that the computed values are correct. To assess the

functional correctness other types of analysis are required. One possibility is to use formal methods to

verify certain functional and temporal properties of a model of the system. The formally verified

properties are then guaranteed to hold in the real system, as long as the model assumptions are not

violated. When it comes to validating the underlying assumptions (e.g., execution times,

synchronization order and the correspondence between specification and implemented code) we must

use dynamic verification techniques which explore and investigate the run-time behavior of the real

system, i.e., testing [Rushby1995a]. Testing can also be used as a complement to, or to replace formal

methods, in the functional verification.

When a sequential program is tested, it is necessary to control the sequence of inputs, and the start

conditions, in order to guarantee reproducibility [McDowell1989]. That is, given the same initial state

and input, the sequential program will deterministically produce the same output on repeated

executions, even in the presence of systematic faults [Rushby1995b]. Reproducibility is essential when

performing regression testing or cyclic debugging [Schütz1994], where the same test cases are run

repeatedly with the intent to validate that either an error correction had the desired effect, or to simply

make it possible to find the error when a failure has been observed.

The behavior of a concurrent program, on the other hand, is not only dependent on the sequence of

inputs but also on the order in which the concurrent programs execute and communicate. In real-time

systems the behavior of the system is also dependent on when the inputs arrive and when the programs

execute and communicate with each other, and with the environment. Trying to apply test techniques of

sequential programs to distributed real-time systems is therefore bound to lead to non-determinism (and

non-reproducibility), because control is only forced on the inputs, disregarding the significance of order

and timing. For instance, in a real-time system with shared resources different inputs may lead to

different execution paths. These paths will in turn lead to different execution times for the tasks, which

depending on the design may lead to different orders of access to the shared resources. As a

consequence there may be different system behaviors if the outcome of the operations on the shared

resources depend on the ordering of the accesses. Hence, in order to facilitate systematic testing of

distributed real-time systems we must, in addition to observing inputs and outputs, observe (or control)

the timing and order of the sequence of inputs, as well as the timing and order of program executions.

In a system with race conditions, which naturally occur in many real-time systems, the act of intrusively

observing the system will always change the odds for the outcome of the race, because the probes will

add to the execution times of the racing tasks. The outcome of a race could therefore be dependent on

the presence, or absence, of a probe. If we later remove the probes after observation, we do not only

decrease the observability, but also change the execution times again – which affects the races, and this

time we do not have the probes in place to observe how the system behaves. This act of intrusively

observing a system is called the probe-effect [McDowell1989, Gait1985] or the Heisenberg uncertainty

in software [LeDoux1985].

There are thus three main problems that need to be solved to make systematic testing of distributed real-

time systems possible: (1) reproducing the inputs with respect to contents, order, and timing, (2)

reproducing the order and timing of the execution of the parallel programs as well as their

communication with each other and the environment, and (3) eliminating the probe-effect.

We are briefly going to discuss (1) and (3), but the focus of this paper is on (2) by presenting a method

for deriving all the possible execution orderings for preemptive periodic real-time systems with fixed

priorities, that are subjected to interrupts and jitter. Each identified execution order constitutes a

scenario, which can be regarded as a single sequence of subroutine calls, where each continuous

execution of a task corresponds to a subroutine, and thus a scenario could be viewed as a sequential

program. We can thus test each scenario with traditional sequential testing methods.

We consider task sets with recurring release patterns, executing in a distributed system, where the

scheduling on each node is handled by a priority driven preemptive scheduler. This includes statically

scheduled systems that are subject to preemption [Xu1990], as well as strictly periodic fixed priority

systems [Liu1973, Audsley1991]

If we run the system for a duration equal to the entire schedule (i.e., a single instance of the release

pattern; typically equal to the Least Common Multiple (LCM) of the period times of the involved tasks)

for a specific test case, and observe the actual execution scenario, we can identify the input and the

produced outputs, including their timing, to be a test in the observed execution scenario, i.e., for the

sequential program it represents. Hence, by classifying tests to belong to different scenarios we can,

given that each scenario yields a deterministic execution (which is our hypothesis) achieve deterministic

testing. The number of tested scenarios can be used as coverage criteria for testing of the system’s

behavior during an LCM period. This is all under the assumption that we can consistently observe the

global state in the distributed real-time system. In order to guarantee this consistency we assume that the

system is globally scheduled, i.e., the release and execution times can be related to “the global time”,

and has a global synchronized time base with a known precision.

If we also have the possibility to control the parameters: input, output, time and synchronization, not

only observe them, we can enforce specific execution scenarios thereby achieving reproducible testing.

Reproducibility increases the effectiveness of testing by eliminating redundant test cases, and by

making it easier to achieve the desired level of coverage.

Paper outline: Section 2 provides a discussion on the main problems in testing of distributed real-time

systems, and how to deal with them. Section 3 presents our system model. Section 4 formalizes the

concept of execution order and presents the algorithm for identifying all the possible execution

orderings in distributed real-time systems. We also give some examples, and extend the analysis to

consider the effects of interrupts. Section 5 suggests a testing strategy for achieving deterministic and

reproducible testing in the context of execution order analysis. Finally, in Section 6, we conclude and

give some hints on future work.

2 TESTING DISTRIBUTED REAL-TIME SYSTEMS

In this section we further discuss the handling of inputs, probe effects and reproducibility in testing of

distributed real time systems.

2.1 Inputs

The problem with reproducing the inputs with respect to contents, order and timing has been addressed

[Glass1980, DeMillo1987, Somerville 1992] specifically through the use of environment simulators.

This is a problem inherent in all software testing. For example, the levels of reliability that can be

assessed using experimental statistical methods for sequential programs is limited to about 10

-4

failures/hour because of the difficulty to represent very rare events accurately [Rushby1995b]. The

outlook of statistical assessment of real-time software does therefore look quite grim due to even more

complex input data profiles. This is however an issue which we will not consider further in this paper.

2.2 Probe-effects

When it comes to dealing with the probe-effect in distributed real-time systems there are basically two

approaches:

• Use special hardware (dual-port memories, etc.) that allows for transparent monitoring of the

system [Plattner1984, Haban1990, Tsai1990, Tsai1996]. This has the severe drawback of being

expensive and only enabling observations of certain aspects of the systems behavior, such as those

related to the external interfaces of the micro-controller, shared resources (such as dual-port

memories) and broadcast communication busses. The ever-increasing integration of functionality in

general-purpose micro-controllers makes it even harder to observe the internal behavior of the

system. The viability of this approach is therefore limited. However, the current trend of making

application specific hardware using FPGAs and VHDL [Calvez1998] gives an opportunity to

conveniently integrate non-intrusive monitoring mechanisms in the hardware.

• Use software for instrumentation [Dodd1992, Tokuda1988]. By including instrumentation code in

the software (application and operating system), we can observe more than possible with the

hardware approach. The main problem here is to eliminate the probe-effect. For a distributed real-

time system we must allocate resources for the probes, including execution time, memory,

communication bus bandwidth and account for the probes when scheduling [Thane1999]. Probes

can be placed at several levels, including [Thane1999]:

• Kernel-probes – that collect information from within the kernel, task-switches, execution

times, etc.

• Inline-probes – that are inserted into application tasks

• Probe-tasks – that are tasks dedicated to collecting data from kernel-probes, inline-probes and

other probe-tasks, and

• Probe-nodes – which are dedicated nodes that collect data from probe-tasks, and which can

monitor communication busses.

By allocating resources for observation and then leaving them in place when the system is

delivered, we eliminate the probe effect – any other approach will lead to probe-effects, or very

limited observability. Some execution strategies, e.g., statically scheduled real-time systems, allow

us to remove probes without temporal side-effects if they are situated within temporal firewalls

[Schütz1994]. That is, as long as we do not change the start-times of tasks, and their times of

output (communication or access to shared resources), we can remove the probes.

In this paper we assume that the probe-effect has been eliminated by allocating sufficient resources for

probes (kernel-, in-line-, task- and node-level ones), and by scheduling the system with the probes as

part of the design.

2.3 Reproducibility

For reproducing the execution order and timing of the tasks in a distributed real-time system the two

main approaches are:

• Deterministic replay. Here the tasks’ run-time behavior is recorded in a log over a period of time.

The execution of the system can then be deterministically replayed off-line. The system cannot be

suspended during run-time, but the off-line replay can be suspended and examined. If any

modifications have been made to the tasks or to the inputs the logging must be repeated.

A deterministic replay method for concurrent Ada programs is presented in [Tai1991]. They log

the synchronization sequence (rendezvous) for a concurrent program P, with input X. The source

code is then modified to facilitate replay; forcing certain rendezvous so that P follows the same

synchronization sequence for X. This approach can reproduce the synchronization orders for

concurrent Ada programs, but not the duration between significant events, because the enforcement

(changing the code) of specific synchronization sequences introduces gross temporal probe-effects.

The replay scheme is thus not suited for real-time systems, neither does it consider the effects of

interrupts, and it is unclear how the method can be extended to handle interrupts.

[Tsai1990] presents a hardware monitoring and replay mechanism. Their approach can replay

significant events with respect to order, access to time, and asynchronous interrupts. Monitoring is

apt for real-time systems because it minimizes the probe-effect. Although replay can be performed

accurately it may be slower than real execution. The main disadvantages are that the approach

needs special hardware and is intended for single-processor systems only. Adapting the approach to

distributed real-time systems requires extensive hardware support and rework.

Another software-based approach is HMON [Dodd1992], which is designed for the HARTS

distributed (real-time) system multiprocessor architecture [Shin1991]. A general-purpose processor

is dedicated to monitoring in each multiprocessor. The probe-effect due to monitoring is eliminated

by modifying system service calls to incorporate monitoring mechanisms, and by letting these be

present also in the final system. The events monitored are system calls, context switches, interrupts,

shared variables references, and application specific events (chosen by the programmer). The

recorded events can then order wise be deterministically replayed, in logical-time [Lamport1978],

but not in real-time. The system can thus not actually monitor a consistent global state and then

reproduce its real-time behavior.

There are a few disadvantages with all the above approaches [Schütz1994]:

• One can only replay what has previously been observed, and no guarantees that every significant

system behavior will be observed accurately can be provided. Also, if a program has been

modified (e.g., corrected) completely new traces have to be generated.

• Dedicated (special) hardware has to be used in order to eliminate (or minimize) the probe-effect.

• Since replay takes place at machine level the amount of information required is usually large. All

inputs and intermediate events (e.g. messages) must be kept.

• Deterministic testing. A distributed real-time system can usually be described by a set of use-cases,

which are sets of cooperating tasks that jointly perform specific functions, e.g., a sample-calculate-

actuate loop in a control system. A precedence relation (execution order), interactions (data-flow),

and a period time [Eriksson1996a] typically define each use-case. To test a use-case, we need to

control the inputs and observe (or control) the execution order.

Because a distributed real-time system may contain several use-cases that run on the same

processor, there might be many different execution orderings, due to variations in preemption

points, varying execution times, and interrupts.

The method presented in this paper, aims at transforming the non-deterministic distributed real-

time systems problem to a set of deterministic sequential program testing problems. This is

achieved by deriving all the possible execution orderings of the distributed system, and regarding

each of them as a sequential program.

Related work can be found in [Yang1992], where all possible synchronization sequences

(rendezvous) for Ada programs are identified, similar to [Tai1991], but [Yang1992] do not attempt

deterministic replay, instead they test the system and consider the actual synchronization sequence

as being part of the output. The number of synchronization sequences, and execution paths,

exercised are used to define coverage. Similar work can be found in [Hwang1995] where they also

attempt deterministic replay, though with the same side effects as [Tai1991]

3 THE SYSTEM MODEL

We assume a distributed system consisting of a set of nodes, which communicate via a broadcast

network, that is asumed to be temporally predictable, i.e., upper bounds on communication latencies are

known or can be calculated [Kopetz1994,Tindell1995, Eriksson1996b]. Each node is a self sufficient

computing element with CPU, memory, network access, a local clock and I/O units for sampling and

actuation of the external system. We further assume the existence of a global synchronized time base

[Kopetz1987, Eriksson1996b] with a known precision δδ, meaning that no two nodes in the system have

local clocks differing by more than δδ.

The software that runs on the distributed system consists of a set of concurrent tasks, communicating by

message passing. The tasks are geographically distributed over the nodes, typically with more than one

task on each node. All synchronization is resolved before run-time. As a consequence no action is

needed to enforce synchronization in the actual program code. Mutual exclusion and precedence is

guaranteed by the different release-times. The distributed system is globally scheduled, which results in

a set of specific schedules for each node. At run-time we need only synchronize the local clocks to

fulfill the global schedule [Kopetz1994].

Task model

We assume a set of jobs (i.e. invocations of tasks) J that are released in a time interval [0, J

max

]. Each

job j∈ J has a release time r

, worst case execution time (WCET

), best case execution time (BCET

), a

deadline D

and a unique priority p

. J represents one instance of a recurring pattern of job executions

with period J

max

, i.e., job j will be released at time r

, r

+ J

max

, r

+ 2J

max

, etc. We further assume that the

schedule is feasible, i.e., that each job j is always completed within its deadline D

We additionally assume a set of interrupts I, where each interrupt k ∈ I has the following attributes

• Minimum, and maximum inter-arrival time (T

max

and T

min

, respectively), priority p

(interrupts can

preempt each other), as well as worst and best case execution time of the interrupt routines (WCET

and BCET

, respectively).

We finally assume that the system is preemptive (both by jobs and interrupts) and that jobs may have

identical release-times.

Relation to other task models

The above task model is fairly general since it includes both preemptive scheduling of statically

generated schedules [Xu1990] and fixed priority scheduling of strictly periodic tasks [Liu1973,

Audsley1991].

To see how static periodic scheduling can be mapped to our task model consider a static schedule for a

set of periodic tasks T, where each task T

∈T has the following attributes and relations to other tasks:

• Period (T

), release time (r

; start time relative period start), deadline (D

; latest completion time

relative r

), worst case execution time (WCET

), best case execution time (BCET

), and priority (p

;

each priority is unique).

• Relations between tasks can be specified by precedence relations (T

; the execution of T

preceeds that of T

), mutual exclusion relations (T

; the execution of T

does not overlap with that

of T

) and communications (T

; at the end of its execution T

sends a message to T

The length of the generated schedule will be the least common multiple of the period times of the

involved tasks, LCM(T); corresponding to J

max

in our task model. For tasks with periods less than

Towards systematic testing of distributed real-time systems

Figures

Citations

Annual Report for 1999

Monitoring, Testing and Debugging of Distributed Real-Time Systems

Using deterministic replay for debugging of distributed real-time systems

Towards a Systematic Test for Embedded Automotive Communication Systems

Debugging Parallel Systems: A State of the Art Report

References

Time, clocks, and the ordering of events in a distributed system

Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Scheduling algorithms for multiprogramming in a hard real-time environment

Bounds on Multiprocessing Timing Anomalies

Software Testing Techniques

Related Papers (5)

Fixed priority pre-emptive scheduling: an historical perspective

Finding Response Times in a Real-Time System

Debugging concurrent programs

A probe effect in concurrent programs

Software Testing Techniques

Frequently Asked Questions (23)

Q1. What are the contributions in "Towards systematic testing of distributed real-time systems" ?

Q2. What are the future works in "Towards systematic testing of distributed real-time systems" ?

Q3. What is the main approach for reproducing the execution order and timing of the tasks in a?

Q4. How does reproducibility increase the effectiveness of testing?

Q5. What is the effect of intrusively observing a system?

Q6. What is the main problem of the probe-effect in distributed realtime systems?

Q7. What can be done to achieve reproducible testing?

Q8. How can the authors control the execution times in other use-cases?

Q9. What is the only way to guarantee a shared memory system?

Q10. What is the way to eliminate the probe-effect due to monitoring?

Q11. What are the future pursuits of the authors?

Q12. What is the main benefit of the execution order analysis?

Q13. How many releases will be made for a task with periods less than 6 LCM(T?

Q14. What is the effect of reducing the execution time jitter?

Q15. What is the effect of a parallel composition of the individual EOGs?

Q16. what is the execution order graph for a set of jobs?

Q17. What is the objective measure of system testability?

Q18. What is the purpose of the static periodic scheduling?

Q19. What is the requirement for integration testing of distributed real-time systems?

Q20. What is the definition of a sequence of subroutine calls?

Q21. What is the definition of a testable execution order?

Q22. What are the main problems that need to be solved to make systematic testing of distributed realtime systems?

Q23. How many times can an interrupt be preempted?