scispace - formally typeset
Open AccessProceedings ArticleDOI

Towards systematic testing of distributed real-time systems

Reads0
Chats0
TLDR
This paper presents a method for identifying all possible orderings of task starts, preemptions and completions for tasks executing in a distributed real-time system and allows test methods for sequential programs to be applied.
Abstract
Reproducible and deterministic testing of sequential programs can in most cases be achieved by controlling the sequence of inputs to the program. The behavior of a distributed real-time system, on the other hand not only depends on the inputs but also on the order and timing of the concurrent tasks that execute and communicate with each other and the environment. Hence, sequential test techniques are not directly applicable, since they disregard the significance of order and timing of the tasks. In this paper we present a method for identifying all possible orderings of task starts, preemptions and completions for tasks executing in a distributed real-time system. Together with an accompanying testing strategy, this method allows test methods for sequential programs to be applied, since each identified ordering can be regarded as a sequential program. In the presented analysis and testing strategy, we consider task sets with recurring release patterns, and take into account the effects of clock synchronization and variations in start and execution times of the involved tasks.

read more

Content maybe subject to copyright    Report

TOWARDS SYSTEMATIC TESTING OF
DISTRIBUTED REAL-TIME SYSTEMS
Henrik Thane, Hans Hansson
Mälardalen Real-Time Research Centre
www.mrtc.mdh.se
Department of Computer Engineering
Mälardalen University
P.O. Box 883, S-721 23, Västerås, Sweden
Tel. +46 21 103157, Fax. +46 21 103110
henrik.thane@mdh.se
ABSTRACT
Reproducible and deterministic testing of sequential programs can in most cases be achieved by
controlling the sequence of inputs to the program. The behavior of a distributed real-time system, on the
other hand, not only depends on the inputs but also on the order and timing of the concurrent tasks that
execute and communicate with each other and the environment. Hence, sequential test techniques are
not directly applicable, since they disregard the significance of order and timing.
In this paper we present a method for identifying all the possible orderings of task starts, preemptions
and completions for tasks executing in distributed real-time systems. This allows test methods for
sequential programs to be used, since we can regard each identified ordering as a sequential program.
The number of identified execution orderings can be used as an objective measure of the testability of
the distributed real-time system. Such a measure is an important quality attribute, which can be utilized
as a new scheduling optimization criterion for generating static schedules of high testability.
Keywords: Testing, distributed real-time systems, determinism, reproducibility, testability.
1 INTRODUCTION
A real-time system is per definition correct if it performs the correct function at the correct time. Using
real-time scheduling theory we can provide guarantees that each task in the system will meet its timing
requirements [Liu1973,Audsley1995,Xu1990], providing that the basic assumptions hold during run-
time, e.g., task execution times and periodicity. However, scheduling theory does not give any
guarantees for the system’s functional behavior, i.e., that the computed values are correct. To assess the
functional correctness other types of analysis are required. One possibility is to use formal methods to
verify certain functional and temporal properties of a model of the system. The formally verified
properties are then guaranteed to hold in the real system, as long as the model assumptions are not
violated. When it comes to validating the underlying assumptions (e.g., execution times,
synchronization order and the correspondence between specification and implemented code) we must
use dynamic verification techniques which explore and investigate the run-time behavior of the real
system, i.e., testing [Rushby1995a]. Testing can also be used as a complement to, or to replace formal
methods, in the functional verification.
When a sequential program is tested, it is necessary to control the sequence of inputs, and the start
conditions, in order to guarantee reproducibility [McDowell1989]. That is, given the same initial state
and input, the sequential program will deterministically produce the same output on repeated
executions, even in the presence of systematic faults [Rushby1995b]. Reproducibility is essential when
performing regression testing or cyclic debugging [Schütz1994], where the same test cases are run

2
repeatedly with the intent to validate that either an error correction had the desired effect, or to simply
make it possible to find the error when a failure has been observed.
The behavior of a concurrent program, on the other hand, is not only dependent on the sequence of
inputs but also on the order in which the concurrent programs execute and communicate. In real-time
systems the behavior of the system is also dependent on when the inputs arrive and when the programs
execute and communicate with each other, and with the environment. Trying to apply test techniques of
sequential programs to distributed real-time systems is therefore bound to lead to non-determinism (and
non-reproducibility), because control is only forced on the inputs, disregarding the significance of order
and timing. For instance, in a real-time system with shared resources different inputs may lead to
different execution paths. These paths will in turn lead to different execution times for the tasks, which
depending on the design may lead to different orders of access to the shared resources. As a
consequence there may be different system behaviors if the outcome of the operations on the shared
resources depend on the ordering of the accesses. Hence, in order to facilitate systematic testing of
distributed real-time systems we must, in addition to observing inputs and outputs, observe (or control)
the timing and order of the sequence of inputs, as well as the timing and order of program executions.
In a system with race conditions, which naturally occur in many real-time systems, the act of intrusively
observing the system will always change the odds for the outcome of the race, because the probes will
add to the execution times of the racing tasks. The outcome of a race could therefore be dependent on
the presence, or absence, of a probe. If we later remove the probes after observation, we do not only
decrease the observability, but also change the execution times again – which affects the races, and this
time we do not have the probes in place to observe how the system behaves. This act of intrusively
observing a system is called the probe-effect [McDowell1989, Gait1985] or the Heisenberg uncertainty
in software [LeDoux1985].
There are thus three main problems that need to be solved to make systematic testing of distributed real-
time systems possible: (1) reproducing the inputs with respect to contents, order, and timing, (2)
reproducing the order and timing of the execution of the parallel programs as well as their
communication with each other and the environment, and (3) eliminating the probe-effect.
We are briefly going to discuss (1) and (3), but the focus of this paper is on (2) by presenting a method
for deriving all the possible execution orderings for preemptive periodic real-time systems with fixed
priorities, that are subjected to interrupts and jitter. Each identified execution order constitutes a
scenario, which can be regarded as a single sequence of subroutine calls, where each continuous
execution of a task corresponds to a subroutine, and thus a scenario could be viewed as a sequential
program. We can thus test each scenario with traditional sequential testing methods.
We consider task sets with recurring release patterns, executing in a distributed system, where the
scheduling on each node is handled by a priority driven preemptive scheduler. This includes statically
scheduled systems that are subject to preemption [Xu1990], as well as strictly periodic fixed priority
systems [Liu1973, Audsley1991]
If we run the system for a duration equal to the entire schedule (i.e., a single instance of the release
pattern; typically equal to the Least Common Multiple (LCM) of the period times of the involved tasks)
for a specific test case, and observe the actual execution scenario, we can identify the input and the
produced outputs, including their timing, to be a test in the observed execution scenario, i.e., for the
sequential program it represents. Hence, by classifying tests to belong to different scenarios we can,
given that each scenario yields a deterministic execution (which is our hypothesis) achieve deterministic
testing. The number of tested scenarios can be used as coverage criteria for testing of the system’s
behavior during an LCM period. This is all under the assumption that we can consistently observe the
global state in the distributed real-time system. In order to guarantee this consistency we assume that the
system is globally scheduled, i.e., the release and execution times can be related to “the global time”,
and has a global synchronized time base with a known precision.
If we also have the possibility to control the parameters: input, output, time and synchronization, not
only observe them, we can enforce specific execution scenarios thereby achieving reproducible testing.
Reproducibility increases the effectiveness of testing by eliminating redundant test cases, and by
making it easier to achieve the desired level of coverage.
Paper outline: Section 2 provides a discussion on the main problems in testing of distributed real-time
systems, and how to deal with them. Section 3 presents our system model. Section 4 formalizes the
concept of execution order and presents the algorithm for identifying all the possible execution

3
orderings in distributed real-time systems. We also give some examples, and extend the analysis to
consider the effects of interrupts. Section 5 suggests a testing strategy for achieving deterministic and
reproducible testing in the context of execution order analysis. Finally, in Section 6, we conclude and
give some hints on future work.
2 TESTING DISTRIBUTED REAL-TIME SYSTEMS
In this section we further discuss the handling of inputs, probe effects and reproducibility in testing of
distributed real time systems.
2.1 Inputs
The problem with reproducing the inputs with respect to contents, order and timing has been addressed
[Glass1980, DeMillo1987, Somerville 1992] specifically through the use of environment simulators.
This is a problem inherent in all software testing. For example, the levels of reliability that can be
assessed using experimental statistical methods for sequential programs is limited to about 10
-4
failures/hour because of the difficulty to represent very rare events accurately [Rushby1995b]. The
outlook of statistical assessment of real-time software does therefore look quite grim due to even more
complex input data profiles. This is however an issue which we will not consider further in this paper.
2.2 Probe-effects
When it comes to dealing with the probe-effect in distributed real-time systems there are basically two
approaches:
Use special hardware (dual-port memories, etc.) that allows for transparent monitoring of the
system [Plattner1984, Haban1990, Tsai1990, Tsai1996]. This has the severe drawback of being
expensive and only enabling observations of certain aspects of the systems behavior, such as those
related to the external interfaces of the micro-controller, shared resources (such as dual-port
memories) and broadcast communication busses. The ever-increasing integration of functionality in
general-purpose micro-controllers makes it even harder to observe the internal behavior of the
system. The viability of this approach is therefore limited. However, the current trend of making
application specific hardware using FPGAs and VHDL [Calvez1998] gives an opportunity to
conveniently integrate non-intrusive monitoring mechanisms in the hardware.
Use software for instrumentation [Dodd1992, Tokuda1988]. By including instrumentation code in
the software (application and operating system), we can observe more than possible with the
hardware approach. The main problem here is to eliminate the probe-effect. For a distributed real-
time system we must allocate resources for the probes, including execution time, memory,
communication bus bandwidth and account for the probes when scheduling [Thane1999]. Probes
can be placed at several levels, including [Thane1999]:
Kernel-probes – that collect information from within the kernel, task-switches, execution
times, etc.
Inline-probes – that are inserted into application tasks
Probe-tasks – that are tasks dedicated to collecting data from kernel-probes, inline-probes and
other probe-tasks, and
Probe-nodes – which are dedicated nodes that collect data from probe-tasks, and which can
monitor communication busses.
By allocating resources for observation and then leaving them in place when the system is
delivered, we eliminate the probe effect – any other approach will lead to probe-effects, or very
limited observability. Some execution strategies, e.g., statically scheduled real-time systems, allow
us to remove probes without temporal side-effects if they are situated within temporal firewalls
[Schütz1994]. That is, as long as we do not change the start-times of tasks, and their times of
output (communication or access to shared resources), we can remove the probes.

4
In this paper we assume that the probe-effect has been eliminated by allocating sufficient resources for
probes (kernel-, in-line-, task- and node-level ones), and by scheduling the system with the probes as
part of the design.
2.3 Reproducibility
For reproducing the execution order and timing of the tasks in a distributed real-time system the two
main approaches are:
Deterministic replay. Here the tasks’ run-time behavior is recorded in a log over a period of time.
The execution of the system can then be deterministically replayed off-line. The system cannot be
suspended during run-time, but the off-line replay can be suspended and examined. If any
modifications have been made to the tasks or to the inputs the logging must be repeated.
A deterministic replay method for concurrent Ada programs is presented in [Tai1991]. They log
the synchronization sequence (rendezvous) for a concurrent program P, with input X. The source
code is then modified to facilitate replay; forcing certain rendezvous so that P follows the same
synchronization sequence for X. This approach can reproduce the synchronization orders for
concurrent Ada programs, but not the duration between significant events, because the enforcement
(changing the code) of specific synchronization sequences introduces gross temporal probe-effects.
The replay scheme is thus not suited for real-time systems, neither does it consider the effects of
interrupts, and it is unclear how the method can be extended to handle interrupts.
[Tsai1990] presents a hardware monitoring and replay mechanism. Their approach can replay
significant events with respect to order, access to time, and asynchronous interrupts. Monitoring is
apt for real-time systems because it minimizes the probe-effect. Although replay can be performed
accurately it may be slower than real execution. The main disadvantages are that the approach
needs special hardware and is intended for single-processor systems only. Adapting the approach to
distributed real-time systems requires extensive hardware support and rework.
Another software-based approach is HMON [Dodd1992], which is designed for the HARTS
distributed (real-time) system multiprocessor architecture [Shin1991]. A general-purpose processor
is dedicated to monitoring in each multiprocessor. The probe-effect due to monitoring is eliminated
by modifying system service calls to incorporate monitoring mechanisms, and by letting these be
present also in the final system. The events monitored are system calls, context switches, interrupts,
shared variables references, and application specific events (chosen by the programmer). The
recorded events can then order wise be deterministically replayed, in logical-time [Lamport1978],
but not in real-time. The system can thus not actually monitor a consistent global state and then
reproduce its real-time behavior.
There are a few disadvantages with all the above approaches [Schütz1994]:
One can only replay what has previously been observed, and no guarantees that every significant
system behavior will be observed accurately can be provided. Also, if a program has been
modified (e.g., corrected) completely new traces have to be generated.
Dedicated (special) hardware has to be used in order to eliminate (or minimize) the probe-effect.
Since replay takes place at machine level the amount of information required is usually large. All
inputs and intermediate events (e.g. messages) must be kept.
Deterministic testing. A distributed real-time system can usually be described by a set of use-cases,
which are sets of cooperating tasks that jointly perform specific functions, e.g., a sample-calculate-
actuate loop in a control system. A precedence relation (execution order), interactions (data-flow),
and a period time [Eriksson1996a] typically define each use-case. To test a use-case, we need to
control the inputs and observe (or control) the execution order.
Because a distributed real-time system may contain several use-cases that run on the same
processor, there might be many different execution orderings, due to variations in preemption
points, varying execution times, and interrupts.
The method presented in this paper, aims at transforming the non-deterministic distributed real-
time systems problem to a set of deterministic sequential program testing problems. This is

5
achieved by deriving all the possible execution orderings of the distributed system, and regarding
each of them as a sequential program.
Related work can be found in [Yang1992], where all possible synchronization sequences
(rendezvous) for Ada programs are identified, similar to [Tai1991], but [Yang1992] do not attempt
deterministic replay, instead they test the system and consider the actual synchronization sequence
as being part of the output. The number of synchronization sequences, and execution paths,
exercised are used to define coverage. Similar work can be found in [Hwang1995] where they also
attempt deterministic replay, though with the same side effects as [Tai1991]
3 THE SYSTEM MODEL
We assume a distributed system consisting of a set of nodes, which communicate via a broadcast
network, that is asumed to be temporally predictable, i.e., upper bounds on communication latencies are
known or can be calculated [Kopetz1994,Tindell1995, Eriksson1996b]. Each node is a self sufficient
computing element with CPU, memory, network access, a local clock and I/O units for sampling and
actuation of the external system. We further assume the existence of a global synchronized time base
[Kopetz1987, Eriksson1996b] with a known precision δδ, meaning that no two nodes in the system have
local clocks differing by more than δδ.
The software that runs on the distributed system consists of a set of concurrent tasks, communicating by
message passing. The tasks are geographically distributed over the nodes, typically with more than one
task on each node. All synchronization is resolved before run-time. As a consequence no action is
needed to enforce synchronization in the actual program code. Mutual exclusion and precedence is
guaranteed by the different release-times. The distributed system is globally scheduled, which results in
a set of specific schedules for each node. At run-time we need only synchronize the local clocks to
fulfill the global schedule [Kopetz1994].
Task model
We assume a set of jobs (i.e. invocations of tasks) J that are released in a time interval [0, J
max
]. Each
job j J has a release time r
j
, worst case execution time (WCET
j
), best case execution time (BCET
j
), a
deadline D
j
and a unique priority p
j
. J represents one instance of a recurring pattern of job executions
with period J
max
, i.e., job j will be released at time r
j
, r
j
+ J
max
, r
j
+ 2J
max
, etc. We further assume that the
schedule is feasible, i.e., that each job j is always completed within its deadline D
j
.
We additionally assume a set of interrupts I, where each interrupt k I has the following attributes
Minimum, and maximum inter-arrival time (T
k
max
and T
k
min
, respectively), priority p
k
(interrupts can
preempt each other), as well as worst and best case execution time of the interrupt routines (WCET
k
and BCET
k
, respectively).
We finally assume that the system is preemptive (both by jobs and interrupts) and that jobs may have
identical release-times.
Relation to other task models
The above task model is fairly general since it includes both preemptive scheduling of statically
generated schedules [Xu1990] and fixed priority scheduling of strictly periodic tasks [Liu1973,
Audsley1991].
To see how static periodic scheduling can be mapped to our task model consider a static schedule for a
set of periodic tasks T, where each task T
i
T has the following attributes and relations to other tasks:
Period (T
i
), release time (r
i
; start time relative period start), deadline (D
i
; latest completion time
relative r
i
), worst case execution time (WCET
i
), best case execution time (BCET
i
), and priority (p
i
;
each priority is unique).
Relations between tasks can be specified by precedence relations (T
i
à
T
j
; the execution of T
i
preceeds that of T
j
), mutual exclusion relations (T
i
#T
j
; the execution of T
i
does not overlap with that
of T
j
) and communications (T
I
YT
j
; at the end of its execution T
i
sends a message to T
j
).
The length of the generated schedule will be the least common multiple of the period times of the
involved tasks, LCM(T); corresponding to J
max
in our task model. For tasks with periods less than

Citations
More filters
DatasetDOI

Annual Report for 1999

TL;DR: This poster presents a short history of Slovakia's Elektrické systémy, a.s.r.o. Svobodu, which was founded in 1991 and is still in use today by the authors of this document.

Monitoring, Testing and Debugging of Distributed Real-Time Systems

TL;DR: This work states that for safety-critical computer based systems, testing is an important part of any software development project, and can typically surpass more than half of the development cost.
Proceedings ArticleDOI

Using deterministic replay for debugging of distributed real-time systems

TL;DR: By the online recording of significant system events, and then deterministically replaying them off-line, a novel software-based approach is presented for the cyclic debugging of distributed real-time systems that can inspect areal-time system in great detail, while still preserving its real- time behaviour.
Journal ArticleDOI

Towards a Systematic Test for Embedded Automotive Communication Systems

TL;DR: This paper focuses on the test of distributed systems based on FlexRay, the protocol that is envisioned as the communication backbone for future automotive systems, and presents a decomposition of the system into layers and mechanisms, and a versatile strategy for monitoring and stimulation under various conditions.

Debugging Parallel Systems: A State of the Art Report

TL;DR: This State of the art Report (SotA) will give an introduction to work presented in the area of debugging large software systems with modern hardware architectures, and discuss techniques used for singlemultiand distributed systems.
References
More filters
Book ChapterDOI

Time, clocks, and the ordering of events in a distributed system

TL;DR: In this paper, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Journal ArticleDOI

Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

TL;DR: The problem of multiprogram scheduling on a single processor is studied from the viewpoint of the characteristics peculiar to the program functions that need guaranteed service and it is shown that an optimum fixed priority scheduler possesses an upper bound to processor utilization.
Book

Scheduling algorithms for multiprogramming in a hard real-time environment

TL;DR: In this paper, the problem of multiprogram scheduling on a single processor is studied from the viewpoint of the characteristics peculiar to the program functions that need guaranteed service, and it is shown that an optimum fixed priority scheduler possesses an upper bound to processor utilization which may be as low as 70 percent for large task sets.
Book

Software Testing Techniques

Boris Beizer
Frequently Asked Questions (23)
Q1. What are the contributions in "Towards systematic testing of distributed real-time systems" ?

In this paper the authors present a method for identifying all the possible orderings of task starts, preemptions and completions for tasks executing in distributed real-time systems. This allows test methods for sequential programs to be used, since the authors can regard each identified ordering as a sequential program. 

Future pursuits include to experimentally validate the usefulness of the presented results, extend the testing methodology, devise testability increasing design rules for DRTS, employ the technique for exact calculations of response-times in fixed priority scheduled systems, and to investigate the benefits of using the testability measure as a new heuristics in the generation of highly testable static schedules. 

For reproducing the execution order and timing of the tasks in a distributed real-time system the two main approaches are: • Deterministic replay. 

Reproducibility increases the effectiveness of testing by eliminating redundant test cases, and by making it easier to achieve the desired level of coverage. 

In a system with race conditions, which naturally occur in many real-time systems, the act of intrusively observing the system will always change the odds for the outcome of the race, because the probes will add to the execution times of the racing tasks. 

For a distributed realtime system the authors must allocate resources for the probes, including execution time, memory, communication bus bandwidth and account for the probes when scheduling [Thane1999]. 

If the authors also have the possibility to control the parameters: input, output, time and synchronization, not only observe them, the authors can enforce specific execution scenarios thereby achieving reproducible testing. 

Control over the execution times in other use-cases can easily be achieved by incorporating delays in the jobs, or running dummies, as long as they stay within each job’s execution time range [BCET, WCET]. 

The only possibility to guarantee (1) in a shared memory system is to use a hardware memory protection scheme, or to by design eliminate shared resources. 

The probe-effect due to monitoring is eliminated by modifying system service calls to incorporate monitoring mechanisms, and by letting these be present also in the final system. 

Future pursuits include to experimentally validate the usefulness of the presented results, extend the testing methodology, devise testability increasing design rules for DRTS, employ the technique for exact calculations of response-times in fixed priority scheduled systems, and to investigate the benefits of using the testability measure as a new heuristics in the generation of highly testable static schedules. 

A positive side effect of the execution order analysis is that the authors get exact response-times for the jobs, even when interrupts afflict the system. 

For tasks with periods less than6 LCM(T), multiple releases will be made in the interval [0, Jmax], e.g., for a task i with 2Ti= LCM(T) there will be two corresponding jobs in their task model, released at ri and ri+Ti. 

An interesting conclusion that can be drawn from these types of jitter, and their effect on the execution order graph, is that:1. Minimizing the execution time jitter minimizes the preemption and release intervals, with the positive effect of reducing the preemption “hit” window, and thus reducing the number of execution order scenarios. 

In the case of perfectly synchronized clocks this essentially amounts to perform a parallel composition of the individual EOGs, using standard techniques for composing timed transition systems [Sifakis1996]. 

The execution order graph for a set of jobs J is generated by a call Eog(ROOT, {}, 0, 0, [0, JMAX]), i.e., with a root node, an empty ready set, and the release-times a and b set to zero, plus the considered interval SI.// n- previous node, rdy- set of ready jobs, a to b – release interval, SI – the considered interval. 

The number of execution orderings is an objective measure of system testability, and can thus be used as a metric for comparing different designs, and schedules. 

To see how static periodic scheduling can be mapped to their task model consider a static schedule for a set of periodic tasks T, where each task Ti∈T has the following attributes and relations to other tasks: • Period (Ti), release time (ri; start time relative period start), deadline (Di; latest completion time relative ri), worst case execution time (WCETi), best case execution time (BCETi), and priority (pi; each priority is unique). 

In order to perform integration testing of distributed real-times systems the following is required:• A feasible global schedule, including probes.• 

Each identified execution order constitutes a scenario, which can be regarded as a single sequence of subroutine calls, where each continuous execution of a task corresponds to a subroutine, and thus a scenario could be viewed as a sequential program. 

To facilitate reproducible testing the authors must identify which execution orderings, or parts of execution orderings that can be enforced without introducing any probe effect. 

There are thus three main problems that need to be solved to make systematic testing of distributed realtime systems possible: (1) reproducing the inputs with respect to contents, order, and timing, (2) reproducing the order and timing of the execution of the parallel programs as well as their communication with each other and the environment, and (3) eliminating the probe-effect. 

The maximum value of w is defined by (4-3) and is based on the assumption that all the interrupts are ready for preemption exactly at the beginning of the interval – this maximizes the number of times the interrupts can preempt the interval.w = WCETA + k interruptsk WCET⋅ ∑ ∈ mink Tw (4-3)This equation can be solved using the standard RTA iteration technique [Joseph1986,Audsley1995].