Fault-tolerant rate-monotonic first-fit scheduling in hard-real-time systems
Summary (4 min read)
1 INTRODUCTION
- THROUGHOUT industrial computing, there is an increasingdemand for more complex and sophisticated hard-realtime computing systems.
- An active copy presents the advantages of requiring no synchronization with its primary copyÐit can run before, after, or concurrently with the other copyÐand of having a larger time window for executionÐnamely, the whole period of the task.
- In particular, this paper extends the RMFF algorithm to tolerate failures under the assumption that processors fail in a fail-stop manner.
2 BACKGROUND
- This section gives a formal definition of the scheduling problem and a precise specification of the fault tolerance model.
- Moreover, important properties of the well-known RM, CTT, and RMFF algorithms are recalled.
2.1 The Scheduling Problem
- The requests for i are periodic, with constant interval Ti between every two consecutive requests, and i's first request occurs at time 0.
- The worst case execution time for all the requests of i is constant and equal to Ci, with Ci Ti. Periodic tasks 1; :::; n are independent, that is the requests of any task do not depend on the execution of the other tasks.
2.2 The Fault-Tolerant Model
- The fault-tolerant scheduling problem consists of finding a schedule for the tasks so as to satisfy the following additional condition: (S4) fault tolerance is guaranteed, namely, conditions (S1)(S3) are verified even in the presence of failures.
- In order to achieve fault tolerance, two copies for each task are used, called primary and backup copies.
- In practice Di is smaller than Ci, since backup copies usually provide a reduced functionality in a smaller execution time than the primary copies.
- Since it must contain the indices of the primary task and of the sender and receiver processors, its size is O log n bits, also known as This message is small.
- The overhead needed for such processor failure detections is mainly given by the short-message latency of the communication subsystem employed.
2.3 The Rate-Monotonic Algorithm
- Liu and Layland [10] proposed a fixed-priority scheduling algorithm, called Rate-Monotonic (RM), for solving the (nonfault-tolerant) problem stated in Section 2.1 on a single processor system, that is when m 1.
- At any instant of time, a pending task with the highest priority is scheduled.
- Liu and Layland proved the following two important results concerning fixed-priority scheduling algorithms.
- The largest response time for any periodic request of 1 occurs whenever i is requested simultaneously with the requests for all higher priority tasks.
- A critical instant occurs when all tasks are in phase at time zero, which is called critical instant phasing, because it is the phasing that results in the longest response time for the first request of each task.
2.4 The Completion Time Test
- From Theorems 1 and 2, the following necessary and sufficient schedulability criterion was derived by Joseph and Pandya [5], as discussed also in [8].
- This schedulability test is called Completion Time Test (CTT).
- It is worth noting that, by Theorem 3, the schedulability of lower priority tasks does not guarantee the schedulability of higher priority tasks.
- Therefore, in order to check the schedulability of a set of tasks, each task must get through the CTT when it is scheduled with all higher priority tasks.
2.5 The Rate-Monotonic First-Fit
- Dhall and Liu [3] generalized the RM algorithm to accommodate multiprocessor systems.
- In particular, they proposed the so called Rate-Monotonic First-Fit (RMFF) algorithm.
- It is a partitioning algorithm, where tasks are first assigned to processors following the RM priority order and then all the tasks assigned to the same processor are scheduled with the RM algorithm.
- Dhall and Liu showed that, using a schedulability condition weaker than CTT, RMFF uses about 2.33U processors in the worst case, where U is the load of the task set.
- In practice, however, RMFF remains competitive, for its simplicity and efficiency.
3 OVERVIEW OF THE FAULT-TOLERANT RMFF ALGORITHM
- This section provides an informal high-level description of the proposed Fault-Tolerant Rate-Monotonic First-Fit algorithm.
- The algorithm prefers to schedule a backup copy as a passive copy whenever possible, so as to overbook each processor with more passive copies whose primary copies are assigned to different processors.
- Clearly, with another ordering a higher priority task can be assigned to the same processor after i.
- I must be schedulable together with all the primary and active backup copies already assigned to Pj; .
- These conditions are analogous to those of (A1), with the difference that the second one takes into account the situation where the failed processor is that running the primary copy i.
4.1 Schedulability Criteria
- In order to extend Theorem 3, consider a generic task set containing both primary and backup copies which must be scheduled all together on a single processor.
- Pj includes the active backup copies assigned to processor.
- Pj be the set of periodic tasks given in priority order which are assigned to processor Pj. Consider now the case that a failure of processor Proof.
- Hence, the proof follows from Theorem 3. tu.
4.2 Fault-Tolerant CTT
- Based on Theorems 4 and 5, two kinds of schedulability tests are needed, one to check for schedulability in the absence of failures, and the other to check for schedulability after a processor failure.
- Pj, since the recovery from the failure of any processor other than Pj must be taken into account.
4.3 Fault-Tolerant RMFF
- The first step assigns the primary copy i to the first processor in which it fits.
- The second step establishes the recovery time Bi and the status of the backup copy i.
- Thus, duplicating on two sets of processors the schedule for the nonfault-tolerant case requires at least four processors to tolerate one failure.
- The proposed FTRMFF algorithm, instead, tolerates one failure using three processors only.
- The procedure FTRMFF-Assignment is executed off-line and requires O nm2 schedulability tests to be performed.
4.4 Recovery from a Processor Failure
- Once an assignment is found by the FTRMFF algorithm, each processor.
- The uncompleted tasks assigned to Pf are recovered by the remaining nonfaulty processors.
- Note also that, in any case, all the active backup copies of primary tasks scheduled on the nonfaulty processors are deallocated from Pj. FTRMFF-Recovery Pf; (1) Do the following steps in parallel for all the processors 2.
- The procedure FTRMFF-Recovery is executed on-line and is very fast, since all the required sets, including passiveRecover Pj; Pf and recover Pj; Pf , were previously computed off-line by the FTRMFF-Assignment procedure, which already made all the schedulability tests, too.
4.5 Tolerating Many Processor Failures
- In order to tolerate many processor failures, spare processors must be employed to replace failed processors on-line.
- Pf is detected within the closest completion time of the task set primary Pf [ active Pf and the time interval between two consecutive failures is three times the largest task request period.
- This phase takes at most During the recovery phase, all the passive copies of the uncompleted tasks assigned to Pf are executed by the non-faulty processors only once (step 1.1.2 of FTRMFFReplacing), and the spare processor Ps inherits.
- The reconfiguration phase is completed by time 2Tn.
- If there are q spare processors, q faulty processors can be replaced by means of the FTRMFF-Replacing procedure, while one additional failure can be tolerated by means of the FTRMFFRecovery procedure.
4.6 Tolerating Software Failures
- In addition to processor failures, a hard-real-time system can fail also due to design faults in the software.
- To explain the ideas of the approach, assume that two different implementations of the same task specification are provided.
- Since the processors are assumed fail-stop, if the acceptance test fails, it signals the presence of an error in the software.
- The time to execute the acceptance test is assumed to be included in the primary copy execution time.
- One approach to implement the recovery from software failures is as follows.
5 SIMULATION EXPERIMENTS
- In order to evaluate the number of processors used by the FTRMFF algorithm for scheduling both primary and backup copies, simulation experiments are performed.
- For the chosen n and , the experiment is repeated 30 times, and the average result is computed.
- The performance metric in all the experiments is the number of processors required to assign a given task set.
- In the outcome of the experiments, the authors denote with N the number of processors required by the FTRMFF algorithm for a task set consisting of both primary and backup task copies, and with M the number of processors required by the RMFF algorithm for a task set with identical primary copies and no backup copies.
6 CONCLUDING REMARKS
- This paper has considered the problem of preemptively scheduling a set of independent periodic tasks under the assumption that each task deadline coincides with the next request of the same task.
- The proposed FTRMFF algorithm extends the well-known Rate-Monotonic First-Fit scheduling algorithm to tolerate failures, uses a novel combined active/passive duplication scheme, and determines by itself which tasks should use passive duplication and which should use active duplication.
- This optimization is left for further work.
- It is worth noting that the proposed algorithm works also if some backup copies are forced to be active.
- Finally, further research could deal with assignment strategies which are different from those considered in this paper.
ACKNOWLEDGMENTS
- The C++ code used in the simulation experiments was written by Andrea Fusiello.
- This work was supported by grants from the Ministero dell'UniversitaÁ e della Ricerca Scientifica e Tecnologica, the Consiglio Nazionale delle Ricerche, and the UniversitaÁ di TrentoÐProgetto Speciale 1997.
Did you find this useful? Give us your feedback
Citations
6 citations
6 citations
Cites methods from "Fault-tolerant rate-monotonic first..."
...In the second level, the task This work was partially supported by Defense Acquisition Program Administration and Agency for Defense Development under the contract....
[...]
5 citations
5 citations
5 citations
Cites background or methods from "Fault-tolerant rate-monotonic first..."
...The time complexity of FTRMFF is O(nm(2)) [3]....
[...]
...FaultTolerant Rate-Monotonic First-Fit (FTRMFF) was proposed in [3] using primary-backup scheme to extend the RMFF algorithm....
[...]
...In this paper, we present a QoS-aware fault-tolerant scheduling algorithm QFTRMFF based on the existing algorithm proposed by Bertossi in [3]....
[...]
...The schedulability test functions are similar to those in [3] and are briefly described here....
[...]
References
5,397 citations
1,401 citations
"Fault-tolerant rate-monotonic first..." refers methods in this paper
...The proposed algorithm determines which tasks must use the active duplication and which can use the passive duplication....
[...]
1,203 citations
"Fault-tolerant rate-monotonic first..." refers background in this paper
...A periodic task i is completely identified by a pair Ci; Ti , where Ci is i's execution time and Ti is i's request period....
[...]
...Passive copy overbooking and active copy deallocation allow many passive copies to be scheduled sharing the same time intervals on the same processor, thus reducing the total number of processors needed....
[...]
1,108 citations
616 citations
"Fault-tolerant rate-monotonic first..." refers methods in this paper
...Dhall and Liu [3] generalized the RM algorithm to...
[...]
...Liu and Layland [10] introduced the Rate-Monotonic (RM) algorithm for preemptively scheduling periodic tasks on a single processor, under the assumption that task deadlines are equal to their periods....
[...]
...Liu and Layland [10] proposed a fixed-priority scheduling algorithm, called Rate-Monotonic (RM), for solving the (nonfault-tolerant) problem stated in Section 2.1 on a single processor system, that is when m 1....
[...]
...RM was generalized to multiprocessor systems by Dhall and Liu [3], who proposed, among others, the Rate-Monotonic First-Fit (RMFF) heuristic....
[...]
...Liu and Layland proved the following two important results concerning fixed-priority scheduling algorithms....
[...]
Related Papers (5)
Frequently Asked Questions (2)
Q2. What are the future works in "Fault-tolerant rate-monotonic first-fit scheduling in hard-real-time systems" ?
However, further research is needed, e. g., to derive an analytical worst case bound on the number of processors used by the proposed FTRMFF algorithm, or to devise schedulability conditions which are weaker but simpler than the Completion Time Test, e. g., as those proposed in [ 1 ]. This optimization is left for further work. As a subject for future research, the combined duplication scheme proposed in the present paper could be used to extend the Rate-Monotonic First-Fit algorithm in order to tolerate failures also in the presence of resource reclaiming and task synchronization. Finally, further research could deal with assignment strategies which are different from those considered in this paper.