scispace - formally typeset
Open AccessJournal ArticleDOI

Fault-tolerant rate-monotonic first-fit scheduling in hard-real-time systems

Reads0
Chats0
TLDR
In this paper, fault tolerance is implemented by using a novel duplication technique where each task scheduled on a processor has either an active backup copy or a passive backup copy planned on a different processor.
Abstract
Hard-real-time systems require predictable performance despite the occurrence of failures. In this paper, fault tolerance is implemented by using a novel duplication technique where each task scheduled on a processor has either an active backup copy or a passive backup copy scheduled on a different processor. An active copy is always executed, while a passive copy is executed only in the case of a failure. First, the paper considers the ability of the widely-used rate-monotonic scheduling algorithm to meet the deadlines of periodic tasks in the presence of a processor failure. In particular, the completion time test is extended so as to check the schedulability on a single processor of a task set including backup copies. Then, the paper extends the well-known rate-monotonic first-fit assignment algorithm, where all the task copies, included the backup copies, are considered by rate-monotonic priority order and assigned to the first processor in which they fit. The proposed algorithm determines which tasks must use the active duplication and which can use the passive duplication. Passive duplication is preferred whenever possible, so as to overbook each processor with many passive copies whose primary copies are assigned to different processors. Moreover, the space allocated to active copies is reclaimed as soon as a failure is detected. Passive copy overbooking and active copy deallocation allow many passive copies to be scheduled sharing the same time intervals on the same processor, thus reducing the total number of processors needed. Simulation studies reveal a remarkable saving of processors with respect to those needed by the usual active duplication approach in which the schedule of the non-fault-tolerant case is duplicated on two sets of processors.

read more

Content maybe subject to copyright    Report

Fault-Tolerant Rate-Monotonic First-Fit
Scheduling in Hard-Real-Time Systems
Alan A. Bertossi, Luigi V. Mancini, and Federico Rossini
AbstractÐHard-real-time systems require predictable performance despite the occurrence of failures. In this paper, fault tolerance is
implemented by using a novel duplication technique where each task scheduled on a processor has either an active backup copy or a
passive backup copy scheduled on a different processor. An active copy is always executed, while a passive copy is executed only in
the case of a failure. First, the paper considers the ability of the widely-used Rate-Monotonic scheduling algorithm to meet the
deadlines of periodic tasks in the presence of a processor failure. In particular, the Completion Time Test is extended so as to check
the schedulability on a single processor of a task set including backup copies. Then, the paper extends the well-known Rate-Monotonic
First-Fit assignment algorithm, where all the task copies, included the backup copies, are considered by Rate-Monotonic priority order
and assigned to the first processor in which they fit. The proposed algorithm determines which tasks must use the active duplication
and which can use the passive duplication. Passive duplication is preferred whenever possible, so as to overbook each processor with
many passive copies whose primary copies are assigned to different processors. Moreover, the space allocated to active copies is
reclaimed as soon as a failure is detected. Passive copy overbooking and active copy deallocation allow many passive copies to be
scheduled sharing the same time intervals on the same processor, thus reducing the total number of processors needed. Simulation
studies reveal a remarkable saving of processors with respect to those needed by the usual active duplication approach in which the
schedule of the non-fault-tolerant case is duplicated on two sets of processors.
Index TermsÐFault tolerance, hard-real-time systems, multiprocessor systems, periodic tasks, rate-monotonic scheduling, task
replication.
æ
1INTRODUCTION
T
HROUGHOUT industrial computing, there is an increasing
demand for more complex and sophisticated hard-real-
time computing systems. In particular, fault tolerance is one
of the requirements that are playing a vital role in the
design of new hard-real-time distributed systems.
A variety of schemes have been proposed to support
fault-tolerant computing in distributed systems, such
schemes can be partitioned into two broad classes. In the
first class, which employs the passive replication techniques,
a passive backup copy of a primary task is assigned to one
or more backup processors; when a primary task fails, the
passive copies of the task are restarted on the backup
processor, hence a passive copy is executed only in the
presence of a failure. In the second class, which employs the
active replication techniques, the same set of tasks is always
executed on two or more sets of processors; every primary
task has an active backup copy: if any task fails, its mirror
image will continue to execute.
Many hard-real-time scheduling problems have been
found to be NP-hard: most likely, there are no optimal
polynomial-time algorithms for them [2], [11]. In particular,
scheduling periodic tasks with arbitrary deadlines is NP-
hard, even if only a single processor is available [12].
Several heuristics for scheduling periodic tasks on uni-
processor and multiprocessor systems have been proposed.
Liu and Layland [10] introduced the Rate-Monotonic (RM)
algorithm for preemptively scheduling periodic tasks on a
single processor, under the assumption that task deadlines
are equal to their periods. Joseph and Pandya [5] later
derived the Completion Time Test (CTT) for checking
schedulability of a set of fixed-priority tasks on a single
processor. RM was generalized to multiprocessor systems
by Dhall and Liu [3], who proposed, among others, the
Rate-Monotonic First-Fit (RMFF) heuristic. More refined
heuristics for multiprocessors were proposed by Burchard,
Liebeherr, Oh, and Son [1].
It is worth noting that the RM algorithm is becoming an
industry standard because of its simplicity and flexibility. It
is a low overhead greedy algorithm, which is optimal
among all fixed-priority algorithms. Moreover, it possesses
certain advantages, for example, the implementation of
efficient schedulers for aperiodic tasks, and the retiming of
intervals in order to shed the load in a predictable fashion
[8].
As for fault-tolerant scheduling algorithms, a dynamic
programming algorithm for multiprocessors was presented
in [7] which ensures that backup schedules can be
efficiently embedded within the primary schedule. An
algorithm was proposed in [9] which generates optimal
schedules in a uniprocessor system by employing a passive
replication to tolerate software failures only. The algorithms
proposed in [14] are based on a bidding strategy and
dynamically recompute the schedule when a processor
934 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 9, SEPTEMBER 1999
. A.A. Bertossi is with the Dipartimento di Matematica, Universita
Á
di
Trento, Via Sommarive 14, 38050 Trento, Italy.
E-mail: bertossi@science.unitn.it.
. L.V. Mancini is with the Dipartimento di Scienze dell'Informazione,
Universita
Á
di Roma ªLa Sapienza,º Via Salaria 113, 00198 Roma, Italy. E-
mail: lv.mancini@dsi.uniroma1.it.
. F. Rossini is with Telecom Italia Mobile, Area Applicazioni Informatiche,
Via Tor Pagnotta 90, 00143 Roma, Italy.
Manuscript received 20 June 1997.
For information on obtaining reprints of this article, please send e-mail to:
tpds@computer.org, and reference IEEECS Log Number 105271.
1045-9219/99/$10.00 ß 1999 IEEE

fails, in order to redistribute the tasks among the remaining
nonfaulty processors. In [13], two algorithms are designed
which reserve time for the processing of backup tasks on
uniprocessors running fixed-priority schedulers. Finally,
the techniques of backup overbooking and backup deal-
location were introduced in [4] to achieve fault tolerance in
multiprocessor systems, but for aperiodic nonpreemptive
tasks only.
It is noted here that none of the fault-tolerant algorithms
discussed above extended the RMFF algorithm or have
combined in the same schedule both active and passive
replication of the tasks. However, the latter idea seems
potentially useful since it provides the ability to exploit the
advantages of both types of replication in the same system.
Indeed, the simplest way to achieve fault tolerance in
hard-real-time systems consists in using active duplication
for all tasks. An active copy presents the advantages of
requiring no synchronization with its primary copyÐit can
run before, after, or concurrently with the other copyÐand
of having a larger time window for executionÐnamely, the
whole period of the task. However, using active duplication
for all tasks doubles the number of processors required in
the nonfault-tolerant case. In contrast, a passive copy can be
executed only if a failure prevents the corresponding
primary copy from completing. A passive copy has the
disadvantages of having tighter timing constraintsÐin the
worst case it is not activated until the scheduled completion
time of the primary copyÐand of requiring some time
overhead for synchronization with the corresponding
primary copy. These drawbacks can be overcome by
choosing active replication when the scheduled completion
time of the primary copy is close to the deadline, that is to
the end of the period, and by having smaller execution
times for the backup copies. Moreover, since the time
overhead for synchronization is usually very small, it can be
included in the execution time of the primary task. Most
importantly, passive duplication has the great advantage of
overbooking the processorsÐmany passive copies whose
primary copies are assigned to different processors can be
scheduled on the same processor so as to share the same
time interval. Indeed, under the assumption of a single
processor failure, only one of such passive copies will be
actually executed, namely, the passive copy whose primary
copy was prevented from completing because of the failure.
Moreover, if only one failure is tolerated, the space allocated
to active copies whose primary copy is not assigned to the
failed processor can be reclaimed as soon as a failure is
detected. Passive copy overbooking and active copy deal-
location allow fewer processors to be used with respect to
the case in which active duplication is used for all tasks.
The present paper considers the problem of preemp-
tively scheduling a set of independent periodic tasks on a
distributed system, such that each task deadline coincides
with the next request of the same task, and all tasks start in-
phase. In particular, this paper extends the RMFF algorithm
to tolerate failures under the assumption that processors fail
in a fail-stop manner. The algorithm determines by itself
which tasks must use active duplication and which can use
passive duplication, preferring passive duplication when-
ever possible. The rest of the paper is organized as follows.
Section 2 gives a formal definition of the scheduling
problem and a precise specification of the fault tolerance
model. Moreover, the classical RM, CTT, and RMFF
algorithms are recalled. Section 3 provides a high-level
description of the proposed Fault-Tolerant Rate-Monotonic
First-Fit (FTRMFF) algorithm. The algorithm analysis is
done in Section 4. In particular, the ability of RM to meet the
deadlines in the presence of one processor failure is
characterized in Section 4.1, and CTT is extended in Section
4.2 so as to check the schedulability on a single processor of
a task set including backup copies. Then, such an extended
CTT is used in Section 4.3 to assign task copies to processors
following a First-Fit heuristic which employs passive copy
overbooking and active copy space reclaiming. An algo-
rithm to recover from a single processor failure is shown in
Section 4.4, while extensions to tolerate both many
processor failures and software failures are presented in
Sections 4.5 and 4.6, respectively. In Section 5, simulation
experiments show that the proposed FTRMFF algorithm
requires fewer processors than the active duplication
approach. Finally, Section 6 summarizes the work and
discusses further possible extensions.
2BACKGROUND
This section gives a formal definition of the scheduling
problem and a precise specification of the fault tolerance
model. Moreover, important properties of the well-known
RM, CTT, and RMFF algorithms are recalled.
2.1 The Scheduling Problem
A periodic task
i
is completely identified by a pair C
i
;T
i
,
where C
i
is
i
's execution time and T
i
is
i
's request period. The
requests for
i
are periodic, with constant interval T
i
between every two consecutive requests, and
i
's first
request occurs at time 0. The worst case execution time for
all the (infinite) requests of
i
is constant and equal to C
i
,
with C
i
T
i
. Periodic tasks
1
;:::;
n
are independent, that is
the requests of any task do not depend on the execution of
the other tasks. The load of a periodic task
i
C
i
;T
i
is
U
i
C
i
=T
i
, while the load of the task set
1
; ...;
n
is U
P
1 i n
U
i
:
Given n independent periodic tasks
1
; ...;
n
and a set of
identical processors, the scheduling problem consists of
finding an order in which all the periodic requests of the
tasks are to be executed on the processors so as to satisfy the
following scheduling conditions:
(S1) integrity is preserved, that is, tasks and processors are
sequential: each task is executed by at most one
processor at a time and no processor executes more than
one task at a time;
(S2) deadlines are met, namely, each request of any task must
be completely executed before the next request of the
same task, that is, by the end of its period;
(S3) the number m of processors is minimized.
2.2 The Fault-Tolerant Model
It is assumed that the processors belong to a distributed
system and are connected by some kind of communication
BERTOSSI ET AL.: FAULT-TOLERANT RATE-MONOTONIC FIRST-FIT SCHEDULING IN HARD-REAL-TIME SYSTEMS 935

subsystem. The failure characteristics of the hardware are
the following:
(F1) Processors fail in a fail-stop manner, that is a processor
is either operational (i.e., nonfaulty) or ceases function-
ing;
(F2) All nonfaulty processors can communicate with each
other;
(F3) Hardware provides fault isolation in the sense that a
faulty processor cannot cause incorrect behavior in a
nonfaulty processor; in other words, processors are
independent as regard to failures;
(F4) The failure of a processor P
f
is detected by the
remaining nonfaulty processors after the failure, but
within the instant corresponding to the closest task
completion time of a task scheduled on P
f
.
Note that assumption (F4) can be easily satisfied by a
specific failure detection protocol as explained below, since
by assumption (F1) all the processors are assumed to be fail-
stop.
The fault-tolerant scheduling problem consists of finding a
schedule for the tasks so as to satisfy the following
additional condition:
(S4) fault tolerance is guaranteed, namely, conditions (S1)-
(S3) are verified even in the presence of failures.
In order to achieve fault tolerance, two copies for each
task are used, called primary and backup copies. The primary
copy
i
has its request period equal to T
i
and its execution
time equal to C
i
, while the backup copy
i
has the same
request period T
i
but an execution time D
i
6 C
i
. Although
the fault-tolerant algorithm to be proposed works also when
D
i
is greater than or equal to C
i
, in practice D
i
is smaller
than C
i
, since backup copies usually provide a reduced
functionality in a smaller execution time than the primary
copies.
The primary copy of a task is always executed, while its
backup copy
i
is executed according to
i
's status, which
can be active or passive. If the status is active, then
i
is
always executed, while if it is passive, then
i
is executed
only when the primary copy fails. In other words, although
both active and passive copies of the primary tasks are
statically assigned to processors, passive backup copies are
actually executed only when a failure of the corresponding
primary copy occurs.
Each passive copy
i
is informed of the completion of
i
at every occurrence of the periodic task by means of a
message that the processor running
i
sends in each period
hT
i
; h 1 T
i
by
i
's completion time to the processor
assigned to the passive copy
i
. This message is small: since
it must contain the indices of the primary task and of the
sender and receiver processors, its size is Olog n bits. If the
message is not received by a certain due time (to be
specified in Section 3), a failure on the processor running
i
is assumed and the passive copy
i
is scheduled. The
overhead needed for such processor failure detections is
mainly given by the short-message latency of the commu-
nication subsystem employed. In particular, with the
current off-the-shelf technology, this overhead can be
estimated in the order of few microseconds and is assumed
to be included in the execution time of the primary copies.
As for active copies, no implicit or explicit synchronization
is assumed with their primary copies, since an active copy
can run before, after, or concurrently with its primary copy.
2.3 The Rate-Monotonic Algorithm
Liu and Layland [10] proposed a fixed-priority scheduling
algorithm, called Rate-Monotonic (RM), for solving the
(nonfault-tolerant) problem stated in Section 2.1 on a single
processor system, that is when m 1. In their algorithm,
each task is assigned a priority according to its request rate
(the inverse of its request period)Ðtasks with short periods
are assigned high priorities. At any instant of time, a
pending task with the highest priority is scheduled. A
currently running task with lower priority is preempted
whenever a request of higher priority occurs, and the
interrupted task is resumed later.
As an example, consider tasks
1
and
2
to be scheduled
on one processor, and let C
1
;T
1
1; 3 and
C
2
;T
2
3; 5. Task
1
has higher priority than
2
, and
the first request of
1
is scheduled during the time interval
0; 1: Then the first request of
2
is scheduled during 1; 3.
At time 3, the second request of
1
comes,
2
is preempted,
and
1
is scheduled during 3; 4. Then
2
is resumed and
scheduled during 4; 5; and so on.
Liu and Layland proved the following two important
results concerning fixed-priority scheduling algorithms.
Theorem 1. The largest response time for any periodic request of
1
occurs whenever
i
is requested simultaneously with the
requests for all higher priority tasks.
Theorem 2. A periodic task set can be scheduled by a fixed-
priority algorithm provided that the deadline of the first
request of each task starting from a critical instant (i.e., an
instant in which all tasks are simultaneously requested) is met.
For example, a critical instant occurs when all tasks are in
phase at time zero, which is called critical instant phasing,
because it is the phasing that results in the longest response
time for the first request of each task. As a consequence, to
check the schedulability of any task
i
, it is sufficient to
check whether
i
is schedulable in its first period 0;T
i
when it is scheduled with all higher priority tasks.
2.4 The Completion Time Test
From Theorems 1 and 2, the following necessary and
sufficient schedulability criterion was derived by Joseph
and Pandya [5], as discussed also in [8].
Theorem 3. Let the periodic tasks
1
; ...;
n
be given in priority
order and scheduled by a fixed-priority algorithm. All the
periodic requests of
i
will meet the deadlines under all task
phasings if and only if:
min
0 <t T
i
X
1 k i
C
k
dt=T
k
e=t
1:
The entire set of tasks
1
; ...;
n
is schedulable under all task
phasings if and only if:
936 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 9, SEPTEMBER 1999

max
1 i n
min
0 <t T
i
X
1ki
C
k
dt=T
k
e=t
1
The minimization required in Theorem 3 is easy to
compute in the case of the Rate-Monotonic algorithm. In
fact, t needs to be checked only a finite number of times, as
explained below.
Let f
1
; ...;
i
g, with T
1
... T
i
, be a set of tasks
in phase at time zero, the cumulative work on a processor
required by tasks in during 0;t is:
Wt; 
X
k
2
C
k
dt=T
k
e:
Create the sequence of times S
0
;S
1
; ... with
S
0
P
k
2
C
k
, and with S
l1
WS
l
;. If for some l,
S
l
S
l1
T
i
, then
i
is schedulable. Otherwise, if T
i
S
l
for some l, task
i
is not schedulable. Note that S
l
is exactly
equal to the minimum t, 0 <t<T
i
,forwhich
P
1 k i
C
k
dt=T
k
et as required in Theorem 3. This
schedulability test is called Completion Time Test (CTT).
As an immediate consequence of the above theorems, the
following property holds:
Property 1. Let the Completion Time Test be satisfied for
1
; ...;
i
, and let S
l
S
l1
T
i
for some l. Then in any
period hT
i
; h 1 T
i
, with h integer,
i
will complete
no later than the instant hT
i
S
l
.
For the sake of clarity, the quantity S
l
will be denoted in
the following by
i
since such a quantity represents the
worst-case completion time of task
i
in any request period T
i
.
As an example of use of CTT, consider again tasks
1
and
2
with C
1
;T
1
1; 3 and C
2
;T
2
3; 5 and let us
check the schedulability of
2
:
S
0
1 3 4;
S
1
W4; f
1
;
2
g 1d4=3e3d4=5e5;
and
S
2
W5; f
1
;
2
g 1d5=3e3d5=5e5:
Since S
1
S
2
T
2
5, all the periodic requests of
2
will
meet their deadlines.
It is worth noting that, by Theorem 3, the schedulability
of lower priority tasks does not guarantee the schedulability
of higher priority tasks. Therefore, in order to check the
schedulability of a set of tasks, each task must get through
the CTT when it is scheduled with all higher priority tasks.
If tasks are picked by priority order, the schedulability test
can proceed in an incremental way: CTT is performed
considering tasks
1
; ...
1
on the period 0;T
i
,for
i 1; ...;n, that is, by adding one task
i
at a time to the
preceding tasks
1
; ...;
iÿ1
; without the need to test again
the schedulability of
1
; ...;
iÿ1
. In this way, as soon as
i
is
computed,
i
will not change anymore, since only lower
priority tasks will be considered later.
2.5 The Rate-Monotonic First-Fit
Dhall and Liu [3] generalized the RM algorithm to
accommodate multiprocessor systems. In particular, they
proposed the so called Rate-Monotonic First-Fit (RMFF)
algorithm. It is a partitioning algorithm, where tasks are
first assigned to processors following the RM priority order
and then all the tasks assigned to the same processor are
scheduled with the RM algorithm. Let T
1
T
2
... T
n
,
the algorithm acts as follows. For i 1; 2; ... ;n, the generic
task
i
is assigned to the first processor P
j
such that
i
and
the other tasks already assigned to P
j
can be scheduled on
P
j
according to RM. If no such processor exists, the task is
assigned to a new processor. Dhall and Liu showed that,
using a schedulability condition weaker than CTT, RMFF
uses about 2.33U processors in the worst case, where U is
the load of the task set. The 2.33 worst case bound was
recently lowered to 1.75 by Burchard, Liebeherr, Oh, and
Son [1], using a schedulability condition stronger than that
used in [3], but without using the RM priority order for task
assignment, and partially using CTT. In practice, however,
RMFF remains competitive, for its simplicity and efficiency.
It employs the same priority order both for assigning tasks
to processors and scheduling tasks on each processor, and
requires on the average a number of processors very close
to U when CTT is used to check for schedulability on each
processor, as confirmed also by the simulation experiments
exhibited in Section 5. Moreover, as shown in Section 4, it
can be extended in a clean way to tolerate hardware and
software failures.
3OVERVIEW OF THE FAULT-TOLERANT RMFF
A
LGORITHM
This section provides an informal high-level description of
the proposed Fault-Tolerant Rate-Monotonic First-Fit
(FTRMFF) algorithm. The algorithm analysis is done in
next section. For the sake of simplicity, only the extension to
tolerate one processor failure is discussed hereafter. Exten-
sions to support many processor failures or software
failures will be discussed in Sections 4.5 and 4.6, respec-
tively.
In the FTRMFF algorithm, primary and backup copies of
different tasks can be assigned to the same processor. Of
course, in order to tolerate a processor failure, the primary
copy and the backup copy of the same task should not be
assigned to the same processor. The algorithm proposed
can be viewed as the RMFF algorithm applied to a task set
including both primary and backup copies. Task copies,
both primary and backup, are ordered by increasing
periods, namely, the priority of a copy is equal to the
inverse of its period. A tie between a primary copy
i
and its
backup copy
i
is broken by giving higher priority to
i
.
Thus tasks are indexed by decreasing RM priorities, and are
assigned to the processors following the order:
1
;
1
;
2
;
2
; ...;
n
;
n
:
CTT is used to check whether a task copy can be
assigned to a processor. Thanks to Property 1 of Section 2,
CTT also provides enough information to decide whether a
backup copy should be active or passive. Indeed, while
checking for schedulability of a primary copy
i
, CTT also
computes its worst-case completion time
i
. If the schedul-
ability test for
i
succeeds, that is when
i
T
i
, then for
BERTOSSI ET AL.: FAULT-TOLERANT RATE-MONOTONIC FIRST-FIT SCHEDULING IN HARD-REAL-TIME SYSTEMS 937

each request period there are at least T
i
ÿ
i
time units to
schedule
i
as a passive copy on another processor. Let
B
i
T
i
ÿ
i
be the recovery time of the backup copy
i
.If
B
i
D
i
, then
i
may be scheduled as a passive copy, since
there is enough time to execute
i
after
i
if a processor
failure prevents
i
from being completed; otherwise
i
must
be scheduled as an active copy. The algorithm prefers to
schedule a backup copy as a passive copy whenever
possible, so as to overbook each processor with more
passive copies whose primary copies are assigned to
different processors.
It is worth noting that, although tasks could be assigned
to processors following any order, considering task copies
by decreasing RM priorities greatly simplifies the algo-
rithm. Indeed, such an ordering is the same ordering used
by the RM algorithm to schedule the tasks assigned to each
processor. Therefore, when a task
i
is assigned to a
processor, only lower priority tasks will be assigned later
to the same processor, and the time intervals for
i
's
execution on the processor will remain unchanged. In
particular, also worst case completion time
i
will remain
unchanged. This allows to determine whether the backup
copy
i
of
i
can be scheduled as a passive copy. Clearly,
with another ordering a higher priority task can be assigned
to the same processor after
i
. In this case,
i
needs to be
recomputed and
i
must be reassigned and rescheduled.
This justifies the
1
;
1
;
2
;
2
; ...
n
;
n
order of assign-
ment. Moreover, since the algorithm generalizes RMFF, it
assigns a backup copy
i
, either passive or active, to the first
processor P
j
such that
i
is not assigned to P
j
, and
i
and
the other primary and backup copies already assigned to P
j
can be scheduled on P
j
according to the RM algorithm for a
single processor.
To find a processor a task copy can be assigned to,
however, several applications of CTT are required, which
take into account the situations in which no processor fails
or any processor fails. The applications of the test depend
on the kind (primary/backup) of the task copy to be
assigned as well as on its status (active/passive) if the copy
is a backup copy. There are three main assignment cases.
(A1) To assign a primary copy
i
to a processor P
j
; two
conditions have to be checked.
.
i
must be schedulable together with all the primary
and active backup copies already assigned to P
j
;
.
i
must be schedulable together with all the primary
copies already assigned to P
j
and all the active and
backup copies assigned to P
j
such that their
corresponding primary copies are all assigned to
the same processor P
f
, and this condition must be
considered for all P
f
6 P
j
:
The first condition takes into account the situation in
which no failure occurs, while the second one takes into
account the situation in which any processor other than P
j
fails. Thus, as many applications of CTT as the total number
of processors are required in the worst case to determine
whether
i
can be assigned to P
j
. Note that the second
condition use the space reserved on P
j
to active copies
whose primary copies are not assigned to P
f
, since only one
processor failure is assumed to be tolerated.
(A2) To assign an active backup copy
i
to a processor P
j
,
assume that the primary copy
i
is already assigned to
processor P
p
6 P
j
twoconditionshavealsotobe
checked.
.
i
must be schedulable together with all the primary
and active backup copies already assigned to P
j
;
.
i
must be schedulable together with all the primary
copies already assigned to P
j
and all the active and
backup copies assigned to P
j
such that their
corresponding primary copies are all assigned to P
p
.
These conditions are analogous to those of (A1), with the
difference that the second one takes into account the
situation where the failed processor is that running the
primary copy
i
. Thus only two applications of CTT are
required to determine whether
i
can be assigned to P
j
.
(A3) Finally, to assign a passive backup copy
i
to a
processor P
j
, assuming again that the primary copy
i
is
already assigned to processor P
p
6 P
j
, only one condi-
tion has to be tested, which is identical to the second
condition of (A2). Thus only one application of CTT is
needed to determine whether
i
can be assigned to P
j
.
As soon as task copies are assigned to processors, all the
copies assigned to the same processor are scheduled with
the RM algorithm. However, in the absence of failures, each
processor executes its primary copies and active backup
copies only. When the processor assigned to
i
does not
receive the synchronization message of
i
by time hT
i
i
,
a failure of the processor running
i
is assumed and the
passive copy
i
is executed. To understand how to recover
from a failure, assume
i
is assigned to processor P
f
which
is detected at time to be failed, with belonging to
hT
i
; h 1 T
i
for any h.If
i
is an active copy scheduled
on a processor P
j
, then
i
will continue to be executed and
no further action is needed for
i
.If
i
is passive, then
i
becomes active on P
j
starting either from , if the execution
of
i
was not completed by P
f
before , or from h 1 T
i
,if
the execution of
i
was already completed before . In other
words, if >hT
i
i
; then
i
was completed before P
f
's
failure and there is no need to schedule
i
by time h 1 T
i
.
If hT
i
i
then, in order to recover the lost computation
of
i
,
i
must be executed for the first time during the
interval ; h 1 T
i
, which in general is shorter than T
i
.It
will be shown in the next section that
i
, the primary copies
of P
j
, and the backup copies of P
j
meet their deadlines even
in this case.
4ANALYSIS OF THE FAULT-TOLERANT RMFF
A
LGORITHM
In this section, necessary and sufficient schedulability
criteria are proved which extend Theorem 3 to schedule a
set of primary and backup copies to recover from one
processor failure. Based on the proposed criteria, a fault-
tolerant extension of RMFF is derived and proved to be
correct.
4.1 Schedulability Criteria
In order to extend Theorem 3, consider a generic task set
containing both primary and backup copies which must be
938 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 9, SEPTEMBER 1999

Citations
More filters
Proceedings ArticleDOI

Real-Time Task Replication for Fault Tolerance in Identical Multiprocessor Systems

TL;DR: This paper considers the replication of periodic hard real-time tasks in identical multiprocessor environments and develops greedy algorithms with a 2-approximation ratio and an asymptotic polynomial-time approximation scheme to minimize the maximum utilization in a system with a specified number of processors.
Proceedings ArticleDOI

Middleware for Resource-Aware Deployment and Configuration of Fault-Tolerant Real-time Systems

TL;DR: A novel task allocation algorithm for passively replicated DRE systems to meet their real-time and fault-tolerance QoS properties while consuming significantly less resources is described.
Journal ArticleDOI

Exploiting primary/backup mechanism for energy efficiency in dependable real-time systems

TL;DR: In this article, a dynamic priority based energy-efficient fault-tolerance scheduling algorithm for periodic real-time tasks running on multiprocessor systems by exploiting the primary/backup technique while considering the negative effects of the widely deployed Dynamic Voltage and Frequency Scaling (DVFS) on transient faults is proposed.
Journal ArticleDOI

Real-time scheduling algorithm for safety-critical systems on faulty multicore environments

TL;DR: A scheme to determine for each task the number of backups that should run in active redundancy in order to increase the probability of meeting all the deadlines is proposed.
Journal ArticleDOI

Preference-oriented real-time scheduling and its application in fault-tolerant systems

TL;DR: The results show that, comparing to that of the well-known EDF scheduler, the scheduling overheads of SEED and POED are higher (but still manageable) due to the additional consideration of tasks' preferences, but can achieve the preference-oriented execution objectives in a more successful way than EDF.
References
More filters
Book

Scheduling algorithms for multiprogramming in a hard real-time environment

TL;DR: In this paper, the problem of multiprogram scheduling on a single processor is studied from the viewpoint of the characteristics peculiar to the program functions that need guaranteed service, and it is shown that an optimum fixed priority scheduler possesses an upper bound to processor utilization which may be as low as 70 percent for large task sets.

Sequencing and scheduling: algorithms and complexity

TL;DR: This survey focuses on the area of deterministic machine scheduling, and reviews complexity results and optimization and approximation algorithms for problems involving a single machine, parallel machines, open shops, flow shops and job shops.

Sequencing and scheduling : algorithms and complexity

TL;DR: A survey of deterministic machine scheduling can be found in this article, where complexity results and optimization and approximation algorithms for problems involving a single machine, parallel machines, open shops, flow shops and job shops are presented.
Journal ArticleDOI

On a Real-Time Scheduling Problem

TL;DR: This work studies the problem of scheduling periodic-time-critical tasks on multiprocessor computing systems and considers two heuristic algorithms that are easy to implement and yield a number of processors that is reasonably close to the minimum number.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What are the contributions in "Fault-tolerant rate-monotonic first-fit scheduling in hard-real-time systems" ?

In this paper, fault tolerance is implemented by using a novel duplication technique where each task scheduled on a processor has either an active backup copy or a passive backup copy scheduled on a different processor. First, the paper considers the ability of the widely-used Rate-Monotonic scheduling algorithm to meet the deadlines of periodic tasks in the presence of a processor failure. Then, the paper extends the well-known Rate-Monotonic First-Fit assignment algorithm, where all the task copies, included the backup copies, are considered by Rate-Monotonic priority order and assigned to the first processor in which they fit. Moreover, the space allocated to active copies is reclaimed as soon as a failure is detected. 

However, further research is needed, e. g., to derive an analytical worst case bound on the number of processors used by the proposed FTRMFF algorithm, or to devise schedulability conditions which are weaker but simpler than the Completion Time Test, e. g., as those proposed in [ 1 ]. This optimization is left for further work. As a subject for future research, the combined duplication scheme proposed in the present paper could be used to extend the Rate-Monotonic First-Fit algorithm in order to tolerate failures also in the presence of resource reclaiming and task synchronization. Finally, further research could deal with assignment strategies which are different from those considered in this paper.