Performance-effective and low-complexity task scheduling for heterogeneous computing

doi:10.1109/71.993206

Performance-Effective and Low-Complexity

Task Scheduling for Heterogeneous Computing

Haluk Topcuoglu, Member, IEEE,SalimHariri,Member, IEEE Computer Society, and

Min-You Wu, Senior Member, IEEE

AbstractÐEfficient application scheduling is critical for achieving high performance in heterogeneous computing environments. The

application scheduling problem has been shown to be NP-complete in general cases as well as in several restricted cases. Because of

its key importance, this problem has been extensively studied and various algorithms have been proposed in the literature which are

mainly for systems with homogeneous processors. Although there are a few algorithms in the literature for heterogeneous processors,

they usually require significantly high scheduling costs and they may not deliver good quality schedules with lower costs. In this paper,

we present two novel scheduling algorithms for a bounded number of heterogeneous processors with an objective to simultaneously

meet high performance and fast scheduling time, which are called the Heterogeneous Earliest-Finish-Time (HEFT) algorithm

and the Critical-Path-on-a-Processor (CPOP) algorithm. The HEFT algorithm selects the task with the highest upward rank

value at each step and assigns the selected task to the processor, which minimizes its earliest finish time with an insertion-based

approach. On the other hand, the CPOP algorithm uses the summation of upward and downward rank values for prioritizing

tasks. Another difference is in the processor selection phase, which schedules the critical tasks onto the processor that minimizes

the total execution time of the critical tasks. In order to provide a robust and unbiased comparison with the related work, a

parametric graph generator was designed to generate weighted directed acyclic graphs with various characteristics. The

comparison study, based on both randomly generated graphs and the graphs of some real applications, shows that our

scheduling algorithms significantly surpass previous approaches in terms of both quality and cost of schedules, which are mainly

presented with schedule length ratio, speedup, frequency of best results, and average scheduling time metrics.

Index TermsÐDAG scheduling, task graphs, heterogeneous systems, list scheduling, mapping.

æ

1INTRODUCTION

D

IVERSE sets of resources interconnected with a high-

speed network provide a new computing platform,

called the heterogeneous computing system, which can

support executing computationally intensive parallel and

distributed applications. A heterogeneous computing

system requires compile-time and runtime support for

executing applications. The efficient scheduling of the

tasks of an application on the available resources is one of

the key factors for achieving high performance.

The general task scheduling problem includes the

problem of assigning the tasks of an application to suitable

processors and the problem of ordering task executions on

each resource. When the characteristics of an application

which includes execution times of tasks, the data size of

communication between tasks, and task dependencies are

known a priori, it is represented with a static model.

In the general form of a static task scheduling problem,

an application represented by a directed acyclic graph

(DAG) in which nodes represent application tasks and

edges represent intertask data dependencies. Each node

label shows computation cost (expected computation time)

of the task and each edge label shows intertask commu-

nication cost (expected communication time) between

tasks. The objective function of this problem is to map

tasks onto processors and order their executions so that

task-precedence requirements are satisfied and a mini-

mum overall completion time is obtained. The task

scheduling problem is NP-complete in the general case

[1], as well as some restricted cases [2], such as scheduling

tasks with one or two time units to two processors and

scheduling unit-time tasks to an arbitrary number of

processors.

Because of its key importance on performance, the task

scheduling problem in general has been extensively studied

and various heuristics were proposed in the literature [3],

[4], [5], [6], [7], [8], [9], [10], [11], [13], [12], [16], [17], [18],

[20], [22], [23], [27], [30]. These heuristics are classified into a

variety of categories (such as list-scheduling algorithms,

clustering algorithms, duplication-based algorithm, guided

random search methods) and they are mainly for systems

with homogeneous processors.

In a list scheduling algorithm [3], [4], [6], [7], [18], [22], an

ordered list of tasks is constructed by assigning priority for

each task. Tasks are selected in the order of their priorities

and each selected task is scheduled to a processor which

minimizes a predefined cost function. The algorithms in

this category provide good quality of schedules and their

performance is comparable with the other categories at a

260 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 3, MARCH 2002

. H. Topcuoglu is with the Computer Engineering Department, Marmara

University, Goztepe Kampusu, 81040, Istanbul, Turkey.

E-mail: haluk@eng.marmara.edu.tr.

. S. Hariri is with the Department of Electrical and Computer Engineering,

University of Arizona, Tucson, AZ 85721-0104.

E-mail: hariri@ece.arizona.edu.

. M.-Y. Wu is with the Department of Electrical and Computer Engineering,

University of New Mexico, Albuquerque, NM 87131-1356.

E-mail: wu@eece.unm.edu.

Manuscript received 28 Aug. 2000; revised 12 July 2001; accepted 6 Sept.

2001.

For information on obtaining reprints of this article, please send e-mail to:

tpds@computer.org, and reference IEEECS Log Number 112783.

1045-9219/02/$17.00 ß 2002 IEEE

lower scheduling time [21], [26]. The clustering algorithms

[3], [12], [19], [25] are, in general for an unbounded number

of processors, so they may not be directly applicable. A

clustering algorithm requires a second phase (a scheduling

module) to merge the task clusters generated by the

algorithm onto a bounded number of processors and to

order the task executions within each processor [24].

Similarly, task duplication-based heuristics are not practical

because of their significantly high time complexity. As an

example, the time complexity of the BTDH Algorithm [30]

and the DSH Algorithm [18] are Ov

4

; the complexity of the

CPFD Algorithm [9] is Oe  v

2

 for scheduling v tasks

connected with e edges on a set of homogeneous processors.

Genetic Algorithms [5], [8], [11], [13], [17], [31] (GAs) are

of the most widely studied guided random search techni-

ques for the task scheduling problem. Although they

provide good quality of schedules, their execution times

are significantly higher than the other alternatives. It was

shown that the improvement of the GA-based solution to

the second best solution was not more than 10 percent and

the GA-based approach required around a minute to

produce a solution, while the other heuristics required an

execution of a few seconds [31]. Additionally, extensive

tests are required to find optimal values for the set of

control parameters used in GA-based solutions.

The task scheduling problem has also been studied by a

few research groups for the heterogeneous systems [6], [7],

[8], [10], [11], [13], [14]. These algorithms may require

assigning a set of control parameters and some of them

confront with the substantially high scheduling costs [6],

[8], [11], [13]. Some of them partition the tasks in a DAG into

levels such that there will be no dependency between tasks

in the same level [10], [14]. This level-by-level scheduling

technique considers the tasks only in the current level (that

is, a subset of ready tasks) at any time, which may not

perform well because of not considering all ready tasks.

Additionally, the study given in [14] presents a dynamic

remapper that requires an initial schedule of a given DAG

and then improves its performance using three variants of

an algorithm, which is out of the scope of this paper.

In this paper, we propose two new static scheduling

algorithms for a bounded number of fully connected

heterogeneous processors: the Heterogeneous Earliest-

Finish-Time (HEFT) algorithm and the Critical-Path-on-a-

Processor (CPOP) algorithm. Although the static-schedul-

ing for heterogeneous systems is offline, in order to

provide a practical solution, the scheduling time (or

running time) of an algorithm is the key constraint.

Therefore, the motivation behind these algorithms is to

deliver good-quality of schedules (or outputs with better

scheduling lengths) with lower costs (i.e., lower schedul-

ing times). The HEFT Algorithm selects the task with the

highest upward rank (defined in Section 4.1) at each step.

The selected task is then assigned to the processor which

minimizes its earliest finish time with an insertion-based

approach. The upward rank of a task is the length of the

critical path (i.e., the longest path) from the task to an exit

task, including the computation cost of the task. The

CPOP algorithm selects the task with the highest (upward

rank + downward rank) value at each step. The algorithm

targets scheduling of all critical tasks (i.e., tasks on the

critical path of the DAG) onto a single processor, which

minimizes the total execution time of the critical tasks. If

the selected task is noncritical, the processors selection

phase is based on earliest execution time with insertion-

based scheduling, as in the HEFT Algorithm.

As part of this research work, a parametric graph

generator has been designed to generate weighted directed

acyclic graphs for the performance study of the scheduling

algorithms. The graph generator targets the generation of

many types of DAGs using several input parameters that

provide an unbiased comparison of task-scheduling algo-

rithms. The comparison study in this paper is based on both

randomly generated task graphs and the task graphs of real

applications, including the Gaussian Elimination Algorithm

[3], [28], FFT Algorithm [29], [30], and a molecular dynamic

code given in [19]. The comparison study shows that our

algorithms significantly surpass previous approaches in

terms of both performance metrics (schedule length ratio,

speedup, efficiency, and number of occurrences giving best

results) and a cost metric (scheduling time to deliver an

output schedule).

The remainder of this paper is organized as follows: In

the next section, we define the research problem and the

related terminology. In Section 3, we provide a taxonomy of

task-scheduling algorithms and the related work in

scheduling for heterogeneous systems. Section 4 introduces

our scheduling algorithms (the HEFT and the CPOP

Algorithms). Section 5 presents a comparison study of our

algorithms with the related work, which is based on

randomly generated task graphs and task graphs of

several real applications. In Section 6, we introduce

several extensions to the HEFT algorithm. The summary

of the research presented and planned future work is

given in Section 7.

2TASK-SCHEDULING PROBLEM

A scheduling system model consists of an application, a

target computing environment, and a performance criteria

for scheduling. An application is represented by a directed

acyclic graph, G V;E, where V is the set of v tasks

and E is the set of e edges between the tasks. (Task and

node terms are interchangeably used in the paper.) Each

edge i; j2E represents the precedence constraint such

that task n

i

should complete its execution before task n

j

starts. Data is a v  v matrix of communication data, where

data

i;k

is the amount of data required to be transmitted from

task n

i

to task n

k

.

In a given task graph, a task without any parent is

called an entry task and a task without any child is called

an exit task. Some of the task scheduling algorithms may

require single-entry and single-exit task graphs. If there is

more than one exit (entry) task, they are connected to a

zero-cost pseudo exit (entry) task with zero-cost edges, which

does not affect the schedule.

We assume that the target computing environment

consists of a set Q of q heterogeneous processors

connected in a fully connected topology in which all

interprocessor communications are assumed to perform

without contention. In our model, it is also assumed that

TOPCUOGLU ET AL.: PERFORMANCE-EFFECTIVE AND LOW-COMPLEXITY TASK SCHEDULING FOR HETEROGENEOUS COMPUTING 261

computation can be overlapped with communication.

Additionally, task executions of a given application are

assumed to be nonpreemptive. W is a v  q computation

cost matrix in which each w

i;j

gives the estimated execution

time to complete task n

i

on processor p

j

. Before scheduling,

the tasks are labeled with the average execution costs. The

average execution cost of a task n

i

is defined as

w

i



X

q

j1

w

i;j

=q: 1

The data transfer rates between processors are stored in

matrix B of size q  q. The communication startup costs of

processors are given in a q-dimensional vector L. The

communication cost of the edge i; k,whichisfor

transferring data from task n

i

(scheduled on p

m

) to task

n

k

(scheduled on p

n

), is defined by

c

i;k

 L

m



data

i;k

B

m;n

: 2

When both n

i

and n

k

are scheduled on the same

processor, c

i;k

becomes zero since we assume that the

intraprocessor communication cost is negligible when it is

compared with the interprocessor communication cost.

Before scheduling, average communication costs are used

to label the edges. The average communication cost of an

edge i; k is defined by

c

i;k

 L 

data

i;k

B

; 3

where

B is the average transfer rate among the processors

in the domain and

L is the average communication

startup time.

Before presenting the objective function, it is necessary to

define the EST and EFT attributes, which are derived from a

given partial schedule. ESTn

i

;p

j

 and EFTn

i

;p

j

 are the

earliest execution start time and the earliest execution finish

time of task n

i

on processor p

j

, respectively. For the entry

task n

entry

,

ESTn

entry

;p

j

0: 4

For the other tasks in the graph, the EFT and EST values

are computed recursively, starting from the entry task, as

shown in (5) and (6), respectively. In order to compute the

EFT of a task n

i

, all immediate predecessor tasks of n

i

must

have been scheduled.

ESTn

i

;p

j

max availj; max

n

m

2predn

i



AFTn

m

c

m;i





;

5

EFTn

i

;p

j

w

i;j

 ESTn

i

;p

j

 ; 6

where predn

i

 is the set of immediate predecessor tasks of

task n

i

and availj is the earliest time at which processor p

j

is ready for task execution. If n

k

is the last assigned task on

processor p

j

, then availj is the time that processor p

j

completed the execution of the task n

k

and it is ready to

execute another task when we have a noninsertion-based

scheduling policy. The inner max block in the EST equation

returns the ready time, i.e., the time when all data needed by

n

i

has arrived at processor p

j

.

After a task n

m

is scheduled on a processor p

j

, the earliest

start time and the earliest finish time of n

m

on processor p

j

is equal to the actual start time, ASTn

m

, and the actual

finish time, AF T n

m

, of task n

m

, respectively. After all

tasks in a graph are scheduled, the schedule length (i.e.,

overall completion time) will be the actual finish time of the

exit task n

exit

. If there are multiple exit tasks and the

convention of inserting a pseudo exit task is not applied, the

schedule length (which is also called makespan) is defined as

makespan  maxfAF T n

exit

g: 7

The objective function of the task-scheduling problem is to

determine the assignment of tasks of a given application to

processors such that its schedule length is minimized.

3RELATED WORK

Static task-scheduling algorithms can be classified into two

main groups (see Fig. 1), heuristic-based and guided

random-search-based algorithms. The former can be further

classified into three groups: list scheduling heuristics,

clustering heuristics, and task duplication heuristics.

List Scheduling Heuristics. A list-scheduling heuristic

maintains a list of all tasks of a given graph according to

their priorities. It has two phases: the task prioritizing (or task

selection) phase for selecting the highest-priority ready task

and the processor selection phase for selecting a suitable

processor that minimizes a predefined cost function (which

can be the execution start time). Some of the examples are

the Modified Critical Path (MCP) [3], Dynamic Level

Scheduling [6], Mapping Heuristic (MH) [7], Insertion-

Scheduling Heuristic [18], Earliest Time First (ETF) [22],

and Dynamic Critical Path (DCP) [4] algorithms. Most of

the list-scheduling algorithms are for a bounded number

of fully connected homogeneous processors. List-schedul-

ing heuristics are generally more practical and provide

better performance results at a lower scheduling time than

the other groups.

Clustering Heuristics. An algorithm in this group maps

the tasks in a given graph to an unlimited number of

clusters. At each step, the selected tasks for clustering can

be any task, not necessarily a ready task. Each iteration

refines the previous clustering by merging some clusters. If

two tasks are assigned to the same cluster, they will be

executed on the same processor. A clustering heuristic

requires additional steps to generate a final schedule: a

cluster merging step for merging the clusters so that the

remaining number of clusters equal the number of

processors, a cluster mapping step for mapping the clusters

on the available processors, and a task ordering step for

ordering the mapped tasks within each processor [24].

Some examples in this group are the Dominant Sequence

Clustering (DSC) [12], Linear Clustering Method [19],

Mobility Directed [3], and Clustering and Scheduling

System (CASS) [25].

Task Duplication Heuristics. The idea behind duplica-

tion-based scheduling algorithms is to schedule a task

graph by mapping some of its tasks redundantly, which

reduces the interprocess communication overhead [9], [18],

[27], [30]. Duplication-based algorithms differ according to

the selection strategy of the tasks for duplication. The

262 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 3, MARCH 2002

algorithms in this group are usually for an unbounded

number of identical processors and they have much higher

complexity values than the algorithms in the other groups.

Guided Random Search Techniques. Guided random

search techniques (or randomized search techniques) use

random choice to guide themselves through the problem

space, which is not the same as performing merely random

walks as in the random search methods. These techniques

combine the knowledge gained from previous search

results with some randomizing features to generate new

results. Genetic algorithms (GAs) [5], [8], [11], [13], [17] are

the most popular and widely used techniques for several

flavors of the task scheduling problem. GAs generate good

quality of output schedules; however, their scheduling

times are usually much higher than the heuristic-based

techniques [31]. Additionally, several control parameters in

a genetic algorithm should be determined appropriately.

The optimal set of control parameters used for scheduling

a task graph may not give the best results for another

task graph. In addition to GAs, simulated annealing [11],

[15] and local search method [16], [20] are the other

methods in this group.

3.1 Task-Scheduling Heuristics for Heterogeneous

Environments

This section presents the reported task-scheduling heuristics

that support heterogeneous processors, which are the

Dynamic Level Scheduling Algorithm [6], the Levelized-

Min Time Algorithm [10], and the Mapping Heuristic

Algorithm [7].

Dynamic-Level Scheduling (DLS) Algorithm. At each

step, the algorithm selects the (ready node, available

processor) pair that maximizes the value of the dynamic

level which is equal to DLn

i

;p

j

rank

s

u

n

i

ESTn

i

;p

j

.

The computation cost of a task is the median value of the

computation costs of the task on the processors. In this

algorithm, upward rank calculation does not consider the

communication costs. For heterogeneous environments, a

new term added for the difference between the task's

median execution time on all processors and its execution

time on the current processor. The general DLS algorithm

has an Ov

3

 q time complexity, where v is the number of

tasks and q is the number of processors.

Mapping Heuristic (MH). In this algorithm, the compu-

tation cost of a task on a processor is computed by the

number of instructions to be executed in the task divided by

the speed of the processor. However, in setting the

computation costs of tasks and the communication costs

of edges before scheduling, similar processing elements

(i.e., homogeneous processors) are assumed; the hetero-

geneity comes into the picture during the scheduling

process.

This algorithm uses static upward ranks to assign

priorities. (The authors also experimented by adding the

communication delay to the rank values.) In this algorithm,

the ready time of a processor for a task is the time when the

processor has finished its last assigned task and is ready to

execute a new one. The MH algorithm does not schedule a

task to an idle time slot that is between two tasks already

scheduled. The time complexity, when contention is con-

sidered, is equal to Ov

2

 q

3

 for v tasks and q processors;

otherwise, it is equal to Ov

2

 q.

Levelized-Min Time (LMT) Algorithm. It is a two-phase

algorithm. The first phase groups the tasks that can be

executed in parallel using the level attribute. The second

phase assigns each task to the fastest available processor. A

task in a lower level has higher priority than a task in a

higher level. Within the same level, the task with the highest

computation cost has the highest priority. Each task is

assigned to a processor that minimizes the sum of the task's

computation cost and the total communication costs with

tasks in the previous levels. For a fully connected graph,

the time complexity is Ov

2

 q

2

 when there are v tasks

and q processors.

TOPCUOGLU ET AL.: PERFORMANCE-EFFECTIVE AND LOW-COMPLEXITY TASK SCHEDULING FOR HETEROGENEOUS COMPUTING 263

Fig. 1. Classification of static task-scheduling algorithms.

4TASK-SCHEDULING ALGORITHMS

Before introducing the details of HEFT and CPOP algorithms,

we introduce the graph attributes used for setting the task

priorities.

4.1 Graph Attributes Used by HEFT and CPOP

Algorithms

Tasks are ordered in our algorithms by their scheduling

priorities that are based on upward and downward

ranking. The upward rank of a task n

i

is recursively

defined by

rank

u

n

i

w

i

 max

n

j

2succn

i



c

i;j

 rank

u

n

j

; 8

where succn

i

 is the set of immediate successors of task n

i

,

c

i;j

is the average communication cost of edge i; j, and w

i

is the average computation cost of task n

i

. Since the rank is

computed recursively by traversing the task graph upward,

starting from the exit task, it is called upward rank. For the

exit task n

exit

, the upward rank value is equal to

rank

u

n

exit

w

exit

: 9

Basically, rank

u

n

i

 is the length of the critical path from

task n

i

to the exit task, including the computation cost of

task n

i

. There are algorithms in the literature which

compute the rank value using computation costs only,

which is called static upward rank, rank

s

u

.

Similarly, the downward rank of a task n

i

is recursively

defined by

rank

d

n

i

 max

n

j

2predn

i



rank

d

n

j

w

j

 c

j;i



; 10

where predn

i

 is the set of immediate predecessors of

task n

i

. The downward ranks are computed recursively

by traversing the task graph downward starting from the

entry task of the graph. For the entry task n

entry

, the

downward rank value is equal to zero. Basically, rank

d

n

i



is the longest distance from the entry task to task n

i

,

excluding the computation cost of the task itself.

4.2 The Heterogeneous-Earliest-Finish-Time (HEFT)

Algorithm

The HEFT algorithm (Fig. 2) is an application scheduling

algorithm for a bounded number of heterogeneous

processors, which has two major phases: a task prioritizing

phase for computing the priorities of all tasks and a

processor selection phase for selecting the tasks in the order

of their priorities and scheduling each selected task on its

ªbestº processor, which minimizes the task's finish time.

Task Prioritizing Phase. This phase requires the priority

of each task to be set with the upward rank value, rank

u

,

which is based on mean computation and mean communica-

tion costs. The task list is generated by sorting the tasks by

decreasing order of rank

u

. Tie-breaking is done randomly.

There can be alternative policies for tie-breaking, such as

selecting the task whose immediate successor task(s) has

higher upward ranks. Since these alternate policies increase

the time complexity, we prefer a random selection strategy.

It can be easily shown that the decreasing order of rank

u

values provides a topological order of tasks, which is a

linear order that preserve the precedence constraints.

Processor Selection Phase. For most of the task schedul-

ing algorithms, the earliest available time of a processor p

j

for a task execution is the time when p

j

completes the

execution of its last assigned task. However, the HEFT

algorithm has an insertion-based policy which considers the

possible insertion of a task in an earliest idle time slot

between two already-scheduled tasks on a processor. The

length of an idle time-slot, i.e., the difference between

execution start time and finish time of two tasks that were

consecutively scheduled on the same processor, should be

at least capable of computation cost of the task to be

scheduled. Additionally, scheduling on this idle time slot

should preserve precedence constraints.

In the HEFT Algorithm, the search of an appropriate idle

time slot of a task n

i

on a processor p

j

starts at the time

equal to the ready

time of n

i

on p

j

, i.e., the time when all

input data of n

i

that were sent by n

i

's immediate

predecessor tasks have arrived at processor p

j

. The search

continues until finding the first idle time slot that is capable of

holding the computation cost of task n

i

. The HEFT algorithm

has an Oe  qtime complexity for e edges and q processors.

For a dense graph when the number of edges is

proportional to Ov

2

 (v is the number of tasks), the time

complexity is on the order of Ov

2

 p.

As an illustration, Fig. 4a presents the schedules obtained

by the HEFT algorithm for the sample DAG of Fig. 3. The

schedule length, which is equal to 80, is shorter than the

schedule lengths of the related work; specifically, the

schedule lengths of DLS, MH, and LMT Algorithms are

91, 91, and 95, respectively. The first column in Table 1 gives

upward rank values for the given task graph. The

scheduling order of the tasks with respect to the HEFT

Algorithm is fn

1

;n

3

;n

4

;n

2

;n

5

;n

6

;n

9

;n

7

;n

8

;n

10

g.

4.3 The Critical-Path-on-a-Processor (CPOP)

Algorithm

Although our second algorithm, the CPOP algorithm

shown in Fig. 5, has the task prioritizing and processor

264 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 3, MARCH 2002

Fig. 2. The HEFT algorithm.

Performance-effective and low-complexity task scheduling for heterogeneous computing

Summary (3 min read)

1 INTRODUCTION

3.1 Task-Scheduling Heuristics for Heterogeneous Environments

4.1 Graph Attributes Used by HEFT and CPOP Algorithms

5 EXPERIMENTAL RESULTS AND DISCUSSION

X II

3. . Number of Occurrences of Better Quality of

5.2 Randomly Generated Application Graphs

5.2.1 Random Graph Generator

5.2.2 Performance Results

5.3.1 Gaussian Elimination

5.3.3 Molecular Dynamics Code

6 ALTERNATE POLICIES FOR THE PHASES OF THE HEFT ALGORITHM

7 CONCLUSIONS

Citations

Cites methods from "Performance-effective and low-compl..."

Additional excerpts

Cites background or result from "Performance-effective and low-compl..."

References

Related Papers (5)

Performance-effective and low-complexity task scheduling for heterogeneous computing

Summary (3 min read)

1 INTRODUCTION

3 RELATED WORK

3.1 Task-Scheduling Heuristics for Heterogeneous Environments

4.1 Graph Attributes Used by HEFT and CPOP Algorithms

5 EXPERIMENTAL RESULTS AND DISCUSSION

X II

3. . Number of Occurrences of Better Quality of

5.2 Randomly Generated Application Graphs

5.2.1 Random Graph Generator

5.2.2 Performance Results

5.3.1 Gaussian Elimination

5.3.3 Molecular Dynamics Code

6 ALTERNATE POLICIES FOR THE PHASES OF THE HEFT ALGORITHM

7 CONCLUSIONS

Citations

Cites methods from "Performance-effective and low-compl..."

Additional excerpts

Cites background or result from "Performance-effective and low-compl..."

References

Related Papers (5)