scispace - formally typeset
Proceedings ArticleDOI

Optimal mapping of sequences of data parallel tasks

Reads0
Chats0
TLDR
This paper addresses the problem of optimizing throughput in task pipelines and presents two new solution algorithms based on dynamic programming and finds the optimal mapping of k tasks onto P processors in O(P4k2) time.
Abstract: 
Many applications in a variety of domains including digital signal processing, image processing and computer vision are composed of a sequence of tasks that act on a stream of input data sets in a pipelined manner. Recent research has established that these applications are best mapped to a massively parallel machine by dividing the tasks into modules and assigning a subset of the available processors to each module. This paper addresses the problem of optimally mapping such applications onto a massively parallel machine. We formulate the problem of optimizing throughput in task pipelines and present two new solution algorithms. The formulation uses a general and realistic model for inter-task communication, takes memory constraints into account, and addresses the entire problem of mapping which includes clustering tasks into modules, assignment of processors to modules, and possible replication of modules. The first algorithm is based on dynamic programming and finds the optimal mapping of k tasks onto P processors in O(P4k2) time. We also present a heuristic algorithm that is linear in the number of processors and establish with theoretical and practical results that the solutions obtained are optimal in practical situations. The entire framework is implemented as an automatic mapping tool for the Fx parallelizing compiler for High Performance Fortran. We present experimental results that demonstrate the importance of choosing a good mapping and show that the methods presented yield efficient mappings and predict optimal performance accurately.

read more

Citations
More filters
Proceedings ArticleDOI

Optimal latency-throughput tradeoffs for data parallel pipelines

TL;DR: This paper presents anew algorithm to determine a processor mapping of a chain of tasks that optimizes the latency in the presence of throughput constraints, and optimization of the throughput with latency constraints.
Proceedings ArticleDOI

A heuristic algorithm for mapping communicating tasks on heterogeneous resources

TL;DR: A heuristic algorithm that maps data processing tasks onto heterogeneous resources (i.e. processors and links of various capacities) is presented and shows that it performs significantly better than communication-ignorant schedulers.
Proceedings ArticleDOI

Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster

TL;DR: This paper is able to find a near-optimal schedule for all of the benchmarks in just a handful of partial program executions through a novel combination of performance modeling, performance prediction, and program execution.
Journal ArticleDOI

A framework for exploiting task and data parallelism on distributed memory multicomputers

TL;DR: This paper explores a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism, implemented as part of the PARADIGM HPF compiler framework the authors have developed.
Book ChapterDOI

Mapping pipeline skeletons onto heterogeneous platforms

TL;DR: It is shown that determining the optimal interval-based mapping is NP-hard for Communication Homogeneous platforms, and this result assesses the complexity of the well-known chains-to-chains problem for different-speed processors.
References
More filters
Journal ArticleDOI

High performance Fortran language specification

TL;DR: (PART I)Fortran Forum is reprinting this High Performance Fortran Language Specification over several issues, devoted to the first four chapters of the HPFF Language Specificat...
Book

Partitioning and Scheduling Parallel Programs for Multiprocessing

Vivek Sarkar
TL;DR: Sarkar et al. as mentioned in this paper presented two approaches to automatic partitioning and scheduling so that the same parallel program can be made to execute efficiently on widely different multiprocessors, based on a macro dataflow model and a compile time scheduling model.
Book

Assignment Problems in Parallel and Distributed Computing

TL;DR: The Motivations for Distributed Processing of Serial Programs is a guide to finding the Optimal Assignment across Space and Time and Formulation of the Problem.
Proceedings ArticleDOI

Exploiting task and data parallelism on a multicomputer

TL;DR: A unified approach to exploiting both kinds of parallelism in a single framework with an existing language is taken and implemented a parallelizing Fortran compiler for the iWarp system based on this approach.
Journal ArticleDOI

Task Parallelism in a High Performance Fortran Framework

TL;DR: Exploiting both data and task parallelism in a single framework is the key to achieving good performance for a variety of applications.