What is the description of the proposed algorithm?

To the best of their knowledge, the proposed algorithm is the first of its kind reported in the literature, in that it most comprehensively addresses the issues of fault-tolerance, reliability, real-time, task precedence constraints, and heterogeneity.

What is the significance of the experimental result?

This experimental result validates the use of the proposed FRCD and eFRCD algorithm to enhance the reliability of the system, especially when tasks either have loose deadlines or no deadlines.

Why is vj B more complex than ESTi P?

The computation of ESTi B(v) is more complex than that of ESTi P(v), due to the need to judiciously overlap some backup copies on the same processors.

What is the difference between eFRCD and FGLS?

In addition, Fig.8 indicates that performability of FGLS increases much more rapidly with heterogeneity level than that of eFRCD, implying that FGLS is more sensitive to the change in computational heterogeneity than eFRCD.

What is the simplest way to determine if viB is a strong primary copy?

Given two tasks vi and vj, (vi, vj)∈ E, if viB is not schedule-preceding vj P, and vi P is a strong primary copy, then vj B and viP can not be allocated to the same processor.

What is the difference between FGLS and eFRCD?

FGLS and FRCD require more computing resources than eFRCD, which is likely to lead to a relatively low SC when the number of processors is fixed.

What is the definition of a measure of computational heterogeneity?

A measure of computational heterogeneity is modeled by a function, C: V×P→ Z+, which represents the execution time of each task on each processor.

Why do eFRCD and FRCD have better reliability?

The FRCD and eFRCD algorithms have much better reliability simply because OV and FGLS do not consider reliability in their scheduling schemes while both FRCD and eFRCD take reliability into account.

(Open Access) An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems (2002) | Xiao Qin

Q: What are the conditions that each backup copy has to satisfy?

In addition to these conditions, each backup copy has three extra conditions to satisfy, namely, (i) it is allocated on the processor that is different than the one assigned for its primary copy, (ii) its start time is later than the finish time of its primary copy plus the fault detection time δ and (iii) it is allowed to overlap with other backup copies on the same processor if their primary copies are allocated to different processors.

An Efficient Fault-tolerant Scheduling Algorithm for Real-time Tasks with

Precedence Constraints in Heterogeneous Systems

∗

∗∗

∗

Xiao Qin Hong Jiang David R. Swanson

Department of Computer Science and Engineering

University of Nebraska-Lincoln

Lincoln, NE 68588-0115, {xqin, jiang, dswanson}@cse.unl.edu

∗

This work was supported by an NSF grant (EPS-0091900) and a Nebraska University Foundation grant (26-0511-0019)

Abstract

In this paper, we investigate an efficient off-line

scheduling algorithm in which real-time tasks with

precedence constraints are executed in a heterogeneous

environment. It provides more features and capabilities

than existing algorithms that schedule only independent

tasks in real-time homogeneous systems. In addition, the

proposed algorithm takes the heterogeneities of

computation, communication and reliability into account,

thereby improving the reliability. To provide fault-

tolerant capability, the algorithm employs a primary-

backup copy scheme that enables the system to tolerate

permanent failures in any single processor. In this

scheme, a backup copy is allowed to overlap with other

backup copies on the same processor, as long as their

corresponding primary copies are allocated to different

processors. Tasks are judiciously allocated to processors

so as to reduce the schedule length as well as the

reliability cost, defined to be the product of processor

failure rate and task execution time. In addition, the time

for detecting and handling of a permanent fault is

incorporated into the scheduling scheme, thus making the

algorithm more practical. To quantify the combined

performance of fault-tolerance and schedulability, the

performability measure is introduced. Compared with the

existing scheduling algorithms in the literature, our

scheduling algorithm achieves an average of 16.4%

improvement in reliability and an average of 49.3%

improvement in performability.

1. Introduction

Heterogeneous distributed systems have been

increasingly used for scientific and commercial

applications, including real-time safety-critical

applications, in which the system depends not only on the

results of a computation, but also on the time instants at

which these results become available. Examples of such

applications include aircraft control, transportation

systems and medical electronics. To obtain high

performance for real-time heterogeneous systems,

scheduling algorithms play an important role. While a

scheduling algorithm maps real-time tasks to processors

in the system such that deadlines and response time

requirements are met, the system must also guarantee its

functional and timing correctness even in the presence of

faults.

The proposed algorithm, referred to as eFRCD (efficient

Fault-tolerant Reliability Cost Driven Algorithm),

endeavors to comprehensively address the issues of fault-

tolerance, reliability, real-time, task precedence

constraints, and heterogeneity. To tolerate one processor

permanent failure, the algorithm uses a Primary/Backup

technique to allocate two copies of each task to different

processors. To further improve the quality of the schedule,

a backup copy is allowed to overlap with other backup

copies on the same processor, as long as their

corresponding primary copies are allocated to different

processors. As an added measure of fault-tolerance, the

proposed algorithm also considers the heterogeneities of

computation and reliability, thereby improving the

reliability without extra hardware cost. More precisely,

tasks are judiciously allocated to processors so as to

reduce the schedule length as well as the reliability cost,

defined to be the product of processor failure rate and task

execution time. In addition, the time for detecting and

handling of a permanent fault is incorporated into the

scheduling scheme, thus making the algorithm more

practical.

The rest of the paper is organized as follows. Section 2

briefly presents related work in the literature. Section 3

describes the workload and the system characteristics.

Section 4 proposes the eFRCD algorithm and the main

principles behind it, including theorems used for

presenting the algorithm. Performance evaluation is given

in Section 5. Section 6 concludes the paper by

summarizing the main contributions of this paper.

Proceedings of the International Conference on Parallel Processing (ICPP’02)

2. Related work

The issue of scheduling on heterogeneous systems has

been studied in the literature in recent years. A scheduling

scheme, STDP, for heterogeneous systems was developed

in [16]. In [3,17], reliability cost was incorporated into

scheduling algorithms for tasks with precedence

constraints. However, these algorithms neither provide

fault-tolerance nor support real-time applications.

Previous work has been done to facilitate real-time

computing in heterogeneous systems. In [7], a solution for

the dynamic resource management problem in real-time

heterogeneous systems was proposed. These algorithms,

however, cannot tolerate any processor failure. Fault-

tolerance is considered in the design of real-time

scheduling algorithms to make systems more reliable.

In paper [6], a mechanism was proposed for supporting

adaptive fault-tolerance in a real-time system. Liberato et

al. proposed a feasibility-check algorithm for fault-

tolerant scheduling [8]. The well-known Rate-Monotonic

First-Fit assignment algorithm was extended in [2].

However, both of the above algorithms assume that the

underlying system either is homogeneous or consists of a

single processor.

The algorithm in [1] is a real-time scheduling algorithm

for tasks with precedence constraint, but it does not

support fault-tolerance. Manimaran et al. [9] and Mosse et

al. [4] have proposed dynamic algorithms to schedule

real-time tasks with fault-tolerance requirements on

multiprocessor systems, but the tasks scheduled in their

algorithms are independent of one another and are

scheduled on-line. Martin [10] devised an algorithm on

the same system and task model as that in [4]. Oh and Son

studied a real-time and fault-tolerant scheduling algorithm

that statically schedules a set of independent tasks [12].

Two common features among these algorithms [4,8,11,

12] are that (1) tasks are independent from one another

and (2) they are designed only for homogeneous systems.

Although heterogeneous systems are considered in both

[17] and eFRCD, the latter considers fault-tolerance and

real-time tasks while the former does not consider either.

Very recently, Girault et al. proposed a real-time

scheduling algorithm for heterogeneous systems that

considers fault-tolerance and tasks with precedence

constraints [5]. This study is by far the closest to eFRCD

that the authors have found in the literature. The main

differences between [5] and eFRCD are three-fold: (a).

eFRCD considers heterogeneities in computation,

communication and reliability that will be defined shortly,

whereas the former only considers computational

heterogeneity. These hetero-geneities. (b). The former

does not take reliability cost into consideration, whereas

eFRCD is reliability-cost driven; and (c). The former

allows the concurrent execution of primary and backup

copies of a task while eFRCD allows backup copies of

tasks whose primary copies are scheduled on different

processors to overlap one another.

In the authors’ previous work, both static [14,15] and

dynamic [13] scheduling schemes for heterogeneous real-

time systems were developed. One similarity among these

algorithms is that the Reliability Cost Driven Scheme is

applied. With the exception of the FRCD algorithm [15],

other algorithms proposed in [13,14] cannot tolerate any

failure. In this paper, the FRCD algorithm [15] is

extended by relaxing the requirement that backup copies

of tasks be not allowed to be overlapped.

3. Workload and system characteristics

A real-time job with dependent tasks can be modelled

by Directed Acyclic Graph (DAG), T = {V, E},whereV=

,...,v

} is a set of tasks, and a set of edges E

represents communication among tasks. e

=(v

)

∈

indicates a message transmitted from task v

to v

,and|e

denotes the volume of data being sent. To tolerate

permanent faults in one processor, a primary-backup

technique is applied. Thus, each task has two copies,

namely, v

and v

, executed sequentially on two different

processors. Without loss of generality, it is assumed that

two copies of a task are identical. The proposed approach

also is applied when two copies of each task are different.

The heterogeneous system consists of a set P={p

,...,p

} of heterogeneous processors connected by a

network. A processor communicates with other processors

through message passing. A measure of computational

heterogeneity is modeled by a function, C: V

→

which represents the execution time of each task on each

processor. Thus, c

) denotes the execution time of v

.Ameasureofcommunicational heterogeneity is

modeled by a function M: E

→

. Communication

time for sending a message e

from v

on p

to v

on p

determined by w

*|e

|,where|e

| is the communication

cost and w

is the weight on the edge between p

and p

representing the delay involved in transmitting a message

of unit length between the two processors.

Given a task v

∈

V, d(v), s(v) and f(v) denote the

deadline, scheduled start time, and finish time,

respectively. p(v) denotes the processor to which v is

allocated. These parameters are subject to constraints: f(v)

=s(v)+c

(v) and f(v)

≤

d(v),wherep(v) =p

. A real-time

job has a feasible schedule if for all v

∈

V, it satisfies both

f(v

)

≤

d(v),andf(v

)

≤

d(v).

A k-timely-fault-tolerant (k-TFT) schedule is defined as

the schedule in which no task deadlines are missed [12],

despite k arbitrary processor failures. The goal of eFRCD

is to achieve 1-TFT.

The reliability cost of task v

on p

is defined as the

product of failure rate,

,ofp

and v

's execution time on

.Itshouldbenotedthatreliability heterogeneity is

Proceedings of the International Conference on Parallel Processing (ICPP’02)

implied in the reliability cost by virtue of heterogeneity in

) and

.LetRC

(R,

) and RC

(R,

)(1

≤

m) be

the reliability cost when no processor fails and when p

fails, where

is a given schedule and R={

,…,

}

is a set of failure rates for the processors. RC

and RC

are

determined by equation (1) and (2), respectively.

)1()(),(

)(

∑∑

=Ψ

ivp

vcRRC

∑∑∑∑

===

≠=≠=

+=Ψ

jvpjvpivp

ijj

PBP

vcvcRRC

)()(,)(

,1,1

)()(),(

λλ













∑∑∑

===

≠=

jvpjvpivp

jjjj

ijj

PBP

vcvc

)()(,)(

)()(

λλ

(2)

In equation (2), the first summation term on the right

hand side represents the reliability cost due to tasks whose

primary copies reside in fault-free processors, while the

second summation term expresses the reliability cost due

to the backup copies of the tasks whose primary copies

reside in the failed processor.

Reliability, given in the following expression, captures

the ability of the system to complete parallel jobs in the

presence of one processor permanent failure.

),(

Ψ−

=Ψ

RRC

eRRL

(3)

4. Scheduling algorithms

In this section, we present the eFRCD algorithm,

which has three objectives, namely, (1) total schedule

length is reduced so that more tasks can complete before

their deadlines; (2) permanent failures in one processor

can be tolerated; and (3) The system reliability is

enhanced by reducing the overall reliability cost of the

schedule.

4.1 An outline

The key for tolerating a single processor failure is to

allocate the primary and backup copies of a task to two

different processors such that the backup copy

subsequently executes if the primary copy fails to

complete due to its processor failure. Not all backup

copies need to execute, even in the presence of a single

processor failure. Since only tasks allocated to the failed

processor are affected and need their backup copies to be

executed, certain backup copies can be scheduled to

overlap with one another. More precisely, a v

is allowed

to overlap with other backup copies on the same

processor, if the corresponding primary copies are

allocated to the different processors to which the v

is not

allocated. Thus, in a feasible schedule, the primary copies

of any two tasks must not be allocated to the same

processor if their backup copies are on the same processor

and there is an overlap between two the backup copies.

This statement is formally described as below.

Proposition 1.

(

)

∧=∈∀ )()(:,

iji

vpvpVvv

(

)

(

)

(

)

)()()()()()()()(

vpvpvfvsvsvfvsvs ≠→<≤∨<≤

Fig. 1 shows an example illustrating this case. In this

example, v

and v

are allocated to p

and p

respectively, and backup copies of v

and v

are both

allocated to p

. These two backup copies can be

overlapped with each other because at most one of them

will ever execute in the single-processor failure model.

The algorithm schedules tasks in the following three

main steps. First, tasks are ordered by their deadlines in

non-decreasing order, such that tasks with tighter

deadlines have higher priorities. Second, the primary

copies are scheduled. Finally, the backup copies are

scheduled in a similar manner as the primary copies,

except that they may be overlapped on the same

processors to reduce schedule length. More specifically,

in the second and third steps, the scheduling of each task

must satisfy the following three conditions: (1) its

deadline should be met; (2) the processor allocation

should lead to the minimum increase in overall reliability

cost among all processors satisfying condition (1); and (3)

it should be able to receive messages from all its

predecessors. In addition to these conditions, each backup

copy has three extra conditions to satisfy, namely, (i) it is

allocated on the processor that is different than the one

assigned for its primary copy, (ii) its start time is later

than the finish time of its primary copy plus the fault

detection time

and (iii) it is allowed to overlap with

other backup copies on the same processor if their

primary copies are allocated to different processors.

Condition (i) and (ii) can be formally described by the

following proposition.

Proposition 2. A schedule is 1-TFT

→

(

)

(

)

+≥∧≠∈∀ )()()()(:

PBBP

vfvsvpvpVv

4.2 The eFRCD algorithm

To facilitate the presentation of the algorithm,

necessary notations are listed in the following table.

Table 1. Definitions of Notation

Notation DEFINITION

D(v)

The set of predecessors of task v. D(v) = {v

|(v

,v)

∈

Fig. 1 Primary copies of v

and v

are allocated to

and p

, respectively, and backup copies of v

and v

are both allocated to p

. These two backup

copies can be overlapped with each other.

time

overla

Proceedings of the International Conference on Parallel Processing (ICPP’02)

S(v)

The set of successors of task v, S(v) = {v

|(v,v

)

∈

F(v) The set of feasible processors to which v

can be

allocated, determined in part by Theorem 2.

B(v) The set of predecessors of v’s backup copy, determined

by Expression (7).

The queue in which all tasks are scheduled to p

, s(v

q+1

∞

,andf(v

) =0

’(v)

The queue in which all tasks are scheduled to p

,and

cannot overlap with the backup copy of task

v,where

s(v

q+1

∞

,andf(v

) =0

is schedule-preceding v

,ifandonlyifs(v

)

≥

f(v

⇒

is message-preceding v

, ifandonlyifv

sends a

message to

. Note that v

⇒

implies v

but not

inversely.

execution-preceding v

, ifandonlyifbothtasksexecute

and v

⇒

Note that v

implies v

⇒

and v

EAT

(v) The earliest available time on p

for v

(

)

Theearliestavailabletimeon

for v

EST

(v) The earliest start time for v

on processor p

EST

(v) The earliest start time for v

on processor p

A detailed pseudocode of the eFRCD algorithm is

presented below.

The eFRCD Algorithm:

Input: T={V,E}, P, C, M, R /* DAG, Distributed System,

Computational, Communicational and Reliability Heterogeneity */

Output: Schedule feasibility of T, and aviablescheduleΨ if it is

feasible

1. Sort tasks by the deadlines in non-decreasing order, subject to

precedence constraints, and generate an ordered list

OL;

/* Schedule primary copies of tasks */

for each task v in OL, following the order, schedule v

.2.1s(v

)

←∞

;rc

←∞

;VQ

= ∅;

2.2

for each processor p

do /* Check if v can be allocated to p

Calculate EST

(v), where VQ

={v

,…,v

}isthequeuein*/

/* which all tasks are scheduled to

, s(v

q+1

∞

,andf(v

) =0*/

2.2.1 /*Compute the earliest start time of v on p

for (j=0toq+1) do

/* check if the unoccupied time intervals, interspersed */

/* by currently scheduled tasks, can accommodate

v */

if s(v

j+1

)-MAX{f(v

), EAT

(v)}

≥

(v) then

EST

(v) = MAX{f(v

), EAT

(v)}; break;

end for

2.2.2 /* Determine the earliest EST

based on Equation (6) */

if v

starts executing at EST

(v) and can be

completed before

d(v) then

Determine reliability cost rc

of v

on p

;

/* Find the minimum reliability cost */

if ((rc

<rc) or (rc

=rcand EST

(v)< s(v

))) then

s(v

) ← EST

(v); p

←

;rc

←

;

end for

2.3 if no proper processor is available for v

, then return(FAIL);

2.4 Assign

p to v, where the reliability cost of v

on p is the minimal;

←

;

2.5 Update information of messages;

end for

3. /*Schedule backup copies of tasks */

for each task v in the ordered list, schedule the backup copy v

3.1 s(v

)

←∞

;rc

←∞

;

Determine whether the v

should be allocated to processor p

3.2 for each feasible processor p

∈

F(v), subject to Proposition 2 and

Theorem 2, do /* identify backup copies already scheduled */

3.2.1

for (v

∈

) do /* on p

that can overlap with v

if (v

is a primary copy) or ((v

is a backup copy) and

(p(v

)=p(v))) then /* subject to Proposition 1 */

copy

into task queue VQ

’(v);

3.2.2 Determine if

is a strong primary copy (using Theorem 4);

3.2.3

for (all v

in task queue VQ

’(v)) do /*check the unoccupied */

/* time intervals, and time slots occupied by backup copies */

/* that

can overlap with v

, can accommodate v

if s(v

j+1

)-MAX{f(v

), EAT

(v)}

≥

(v) then

EST

(v)= MAX{f(v

), EAT

(v)}; break;

end for

3.2.4 /*Determine the earliest EST

based on Equation (9) */

if v starts executing at EST

(v) and can be completed

before

d(v) then

Determine reliability cost rc

of v

on p

;

/* Find the minimum

rc */

if ((rc

<rc)or(rc

=rc and EST

(v)< s(v

))) then

s(v

) ← EST

(v); p

←

;rc

←

;

end if

end for

3.3 if no proper processor is available v

, then return(FAIL);

3.4 Find and assign

∈

F(v) to v, where the reliability cost of v

on p

is the minimal; VQ

←

;

3.5 Update information of messages;

3.6

for each task v

∈

B(v) do /* avoid redundant messages */

sends message to v

if possible; (based on Theorem 1 and

Expression (7) )

3.7

for each task v

∈

S(v) do /* avoid redundant messages */

if p(v

)

≠

p(v

) or v

is not a strong primary copy then

sends message to v

if possible; (based on Theorem 3)

end for

return (SUCCEED);

4.3 The scheduling principles

Recall that EST(v) and EAT(v) are important to

determine a proper schedule for a given task v. While both

EAT and EST indicate a time when all messages from v's

predecessors have arrived, EST additionally signifies that

the processor to which v is allocated is now available for v

to start execution. In the following, we present a series of

derivations that lead to the final expressions for EAT(v)

and EST(v).

If only one of v’s predecessors v

∈D(v)is considered,

then the earliest available time EAT

(v, v

) for the primary/

backup copies of task v depends on the finish time f(v

the earliest message start time MST

(e), and the

transmission time w

*|e|, for message e sent from v

to v,

where p

=p(v

).Thus,

EAT

(v, v

)= f(v

) if p

= p

MST

(e) + w

*|e| otherwise (4)

Now consider all predecessors of v. Clearly v must wait

until the last message from all its predecessors has

arrived. Thus the earliest available time for v

on p

EAT

(v) is the maximum of EAT

(v, v

) over all the

predecessors.

{

}

),()(

)(

ivDv

vvEATMAXvEAT

∈

(5)

Based on expression (5), EST

(v) on p

can be

computed by checking the queue VQ

to find out if the

processor has an idle time slot that starts later than task’s

EAT

(v) and is large enough to accommodate the task.

This procedure is described in step 2.2.1 in the algorithm.

EST

(v) is applied to derive EST

(v), the earliest start

time for v

on any processor. Expression for EST

(v) is

given below.

Proceedings of the International Conference on Parallel Processing (ICPP’02)

{}

)()( vESTMINvEST

′

∈

(6)

where

{}













×=×

′′

∈=

′

′′

∈

iii

vcMINvcPpP

λλ

)()(

,and

P'’={p

∈

P| EST

(v) + c

(v) < d(v)}.

EST

(v),the earliest start time for v

, is computed in a

more complex way than EST

(v). This is because the set

of predecessors of v

(v), contains exclusively the

primary copies of v’s predecessor tasks, whereas the set of

predecessors of v

, B(v), may contain a certain

combination of the primary and backup copies of v’s

predecessor tasks. In order to decide B(v), it is necessary

to introduce the notion of strong primary copy as follows.

Note that there are two cases in which v

may fail to

execute: (1) p(v

) fails before time f(v

),and(2)v

fails to

receive messages from all its predecessors. Case (2) is

illustrated by a simple example in Fig. 2 where dotted

lines denote messages sent from predecessors to

successors. Let v

be a predecessor of v,andp(v)

≠

p(v

Suppose at time t<f(v

), p(v

) fails, then v

should

execute. Suppose v

is not schedule-preceding v

, v

can

not receive any message from v

. Hence, even if p(v

)

does not fail, v

still can not execute. The primary copy of

a task that never encounters case (2) is referred to as a

strong primary copy, as formally defined below.

Definition 1. Given a task v, v

is a strong primary copy,

if and only if the execution of v

implies the failure of

p(v

) before time f(v

)). Alternatively, given a task v, v

a strong primary copy, if and only if no failures of p(v

) at

time f(v

)) imply the execution of v

Recall that one assumption is that only one processor

will encounter permanent failures, we observe that if v

a predecessor of v

, and the primary copies of both tasks

are strong primary copies, then v

is not message-

preceding v

. Fig. 3 illustrates a scenario of the case,

which is presented formally in the theorem 1 that is

helpful in determining the set of predecessors for a

backup copy (See step 3.6).

Theorem 1. Given two tasks v

and v

, v

is a predecessor

of v

. v

is not message-preceding v

, meaning that v

does not need to send message to v

,ifv

and v

are both

strong primary copies, and p(v

) ≠ p(v

Proof: Since v

and v

are both strong primary copies,

according to Definition 1, v

and v

can both execute if

and only if both v

and v

have failed to execute due to

processor failures. But v

and v

are allocated to two

different processors, an impossibility. Thus, at least one of

and v

will not execute, implying that no messages

need to be sent from v

to v

. 

Let B(v)

⊂

V be the set of predecessors of v

.Itis

defined as follows.

B(v) = { v

∈

D(v)}

∪

∈

D(v)

∧

is not a strong primary copy

∨

is not a strong

primary copy

∨

p(v

) = p(v

))} = D

(v)

∪

(v) (7)

In the eFRCD algorithm, the primary copy is allocated

before its corresponding backup copy is scheduled.

Hence, given a task v and its predecessor v

∈

D(v),two

copies of v

should have been allocated when the

algorithm starts scheduling v

. Obviously, v

must receive

Fig. 2 Since processor p

fails,

executes.

Becuase v

can not receive message from v

must execute instead of v

time

Fig. 3 (v

)

∈

∈∈

∈

E, v

and v

are both strong

primary copies, and v

and v

are

scheduled on two different processors. v

is not execution-preceding v

time

Primary copy of v

Backup copy of v

Primary copy of v

Predecessor Successor

Fig. 4 (v

)

∈

∈∈

∈

E, v

is not schedule-preceding

and v

is a strong primary copy. v

can

not be scheduled on the processor on which

is scheduled.

time

Fig. 5

is the predecessor of

and

are

scheduled on the same processor, and v

the strong primary copy. In this case, v

is not

execution-preceding v

time

Proceedings of the International Conference on Parallel Processing (ICPP’02)

An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems

Figures

Citations

Risk-resilient heuristics and genetic algorithms for security-assured grid job scheduling

Scheduling security-critical real-time applications on clusters

Performance implications of failures in large-scale cluster scheduling

A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems

A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters

References

Introduction to Probability

Introduction to Probability

Deadline Scheduling for Real-Time Systems: EDF and Related Algorithms

Proceedings of the International Conference on Parallel Processing

Task allocation for maximizing reliability of distributed computer systems

Related Papers (5)

Performance-effective and low-complexity task scheduling for heterogeneous computing

Scheduling algorithms for multiprogramming in a hard real-time environment

Fault-tolerance through scheduling of aperiodic tasks in hard real-time multiprocessor systems

A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems

A fault-tolerant dynamic scheduling algorithm for multiprocessor real-time systems and its analysis

Frequently Asked Questions (10)

Q1. What is the reliability cost of a task when no processor fails?

Q2. What is the description of the proposed algorithm?

Q3. What is the significance of the experimental result?

Q4. Why is vj B more complex than ESTi P?

Q5. What are the conditions that each backup copy has to satisfy?

Q6. What is the difference between eFRCD and FGLS?

Q7. What is the simplest way to determine if viB is a strong primary copy?

Q8. What is the difference between FGLS and eFRCD?

Q9. What is the definition of a measure of computational heterogeneity?

Q10. Why do eFRCD and FRCD have better reliability?