Performance-effective and low-complexity task scheduling for heterogeneous computing
Summary (3 min read)
1 INTRODUCTION
- In the next section, the authors define the research problem and the related terminology.
- Section 4 introduces their scheduling algorithms (the HEFT and the CPOP Algorithms).
- Section 5 presents a comparison study of their algorithms with the related work, which is based on randomly generated task graphs and task graphs of several real applications.
3.1 Task-Scheduling Heuristics for Heterogeneous Environments
- The first phase groups the tasks that can be executed in parallel using the level attribute.
- The second phase assigns each task to the fastest available processor.
- Within the same level, the task with the highest computation cost has the highest priority.
- Each task is assigned to a processor that minimizes the sum of the task's computation cost and the total communication costs with tasks in the previous levels.
4.1 Graph Attributes Used by HEFT and CPOP Algorithms
- The downward ranks are computed recursively by traversing the task graph downward starting from the entry task of the graph.
- For the entry task n entry , the downward rank value is equal to zero.
5 EXPERIMENTAL RESULTS AND DISCUSSION
- The authors present the comparative evaluation of their algorithms and the related work given in Section 3.1.
- For this purpose, the authors consider two sets of graphs as the workload for testing the algorithms: randomly generated application graphs and the graphs that represent some of the numerical real world problems.
- First, the authors present the metrics used for performance evaluation, which is followed by two sections on experimental results.
X II
- The SLR of a graph (using any algorithm) cannot be less than one since the denominator is the lower bound.
- The taskscheduling algorithm that gives the lowest SLR of a graph is the best algorithm with respect to performance.
- Average SLR values over several task graphs are used in their experiments. .
- The speedup value for a given graph is computed by dividing the sequential execution time (i.e., cumulative computation costs of the tasks in the graph) by the parallel execution time (i.e., the makespan of the output schedule).
- The sequential execution time is computed by assigning all tasks to a single processor that minimizes the cumulative of the computation costs.
3. . Number of Occurrences of Better Quality of
- The number of times that each algorithm produced better, worse, and equal quality of schedules compared to every other algorithm is counted in the experiments.
- The running time (or the scheduling time) of an algorithm is its execution time for obtaining the output schedule of a given task graph.
- This metric basically gives the average cost of each algorithm.
- Among the algorithms that give comparable SLR values, the one with the minimum running time is the most practical implementation.
- The minimization of SLR by checking all possible task-processor pairs can conflict with the minimization in the running time.
5.2 Randomly Generated Application Graphs
- In their study, the authors first considered the randomly generated application graphs.
- A random graph generator was implemented to generate weighted application DAGs with various characteristics that depend on several input parameters given below.
- The authors simulation-based framework allows assigning sets of values to the parameters used by random graph generator.
- This framework first executes the random graph generator program to construct the application DAGs, which is followed by the execution of the scheduling algorithms to generate output schedules, and, finally, it computes the performance metrics based on the schedules.
5.2.1 Random Graph Generator
- These combinations give 2,250 different DAG types.
- Since 25 random DAGs were generated for each DAG type, the total number of DAGs used in their experiments was around STu.
- Assigning several input parameters and selecting each parameter from a large set cause the generation of diverse DAGs with various characteristics.
- Experiments based on diverse DAGs prevent biasing toward a particular scheduling algorithm.
5.2.2 Performance Results
- Finally, the number of times that each scheduling algorithm in the experiments produced better, worse, or equal schedule length compared to every other algorithm was counted for the 56250 DAGs used.
- Each cell in Table 2 indicates the comparison results of the algorithm on the left with the algorithm on the top.
- The ªcombinedº column shows the percentage of graphs in which the algorithm on the left gives a better, equal, or worse performance than all other algorithms combined.
- The ranking of the algorithms, based on occurrences of best results, is {HEFT, DLS, CPOP, MH, LMT}.
- The ranking with respect to average SLR values was: {HEFT, CPOP, DLS, MH, LMT}.
5.3.1 Gaussian Elimination
- For the efficiency comparison, the number of processors used in their experiments is varied from 2 to 16, incrementing by the power of 2; the CCR and range percentage parameters have the same set of values.
- Fig. 9b gives efficiency comparison for Gaussian elimination graphs when the matrix size is 50.
- The HEFT and DLS algorithms have better efficiency than the other algorithms.
- Since the matrix size is fixed, an increase in the number of processors decreases the makespan for each algorithm.
- As an example, when the matrix size is 50 for 16 processors, the DLS algorithm takes 16.2 times longer than the HEFT algorithm to schedule a given graph.
5.3.3 Molecular Dynamics Code
- This application is part of their performance evaluation since it has an irregular task graph.
- Since the number of tasks is fixed in the application and the structure of the graph is known, only the values of CCR and range percentage parameters (in Section 5.2) are used in their experiments.
- Fig. 14a shows the performance of the algorithms with respect to five different CCR values when the number of processors is equal to six.
- It was also observed that the DLS and LMT algorithms take a running time almost three times longer than the other three algorithms (HEFT, CPOP, and MH).
6 ALTERNATE POLICIES FOR THE PHASES OF THE HEFT ALGORITHM
- The original HEFT algorithm outperforms these alternates for small CCR graphs.
- For high CCR graphs, some benefit has been observed by taking critical child tasks into account during processor selection.
- When QXH gg `TXH, B1 policy slightly outperforms the original HEFT algorithm.
- TXH, B2 policy outperforms the original algorithm and others alternates by 4 percent.
7 CONCLUSIONS
- This extension may come up with some bounds on the degradation of makespan given that the number of processors available may not be sufficient.
- The authors plan to extend the HEFT Algorithm for rescheduling tasks in response to changes in processor and network loads.
- It is also planned to extent these algorithms for arbitrary-connected networks by considering the link contention.
Did you find this useful? Give us your feedback
Citations
1,116 citations
903 citations
701 citations
Cites methods from "Performance-effective and low-compl..."
...The Mapper supports a variety of site selection strategies such as Random, Round Robin, and HEFT [59]....
[...]
580 citations
Additional excerpts
...All rights reserved....
[...]
460 citations
Cites background or result from "Performance-effective and low-compl..."
...HEFT has a complexity of Oðv2:pÞ, where v is the number of tasks and p is the number of processors....
[...]
...Index Terms—Application scheduling, DAG scheduling, random graphs generator, static scheduling Ç...
[...]
...Clustering heuristics are mainly proposed for homogeneous systems to form clusters of tasks that are then assigned to processors....
[...]
...In comparison with clustering algorithms, they have lower time complexity, and in comparison with task duplication strategies, their solutions represent a lower processor workload....
[...]
References
42,654 citations
40,020 citations
21,651 citations
5,749 citations
1,356 citations