scispace - formally typeset
Search or ask a question

Showing papers by "Charles E. Leiserson published in 1993"


Proceedings ArticleDOI
01 Jun 1993
TL;DR: This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using signicantly more space per processor than that required for a single-processor execution and proposes a decentralized algorithm that can compute and execute a P-processor schedule online in expected time O(T1=P + T1 lgP) and worst-case space O(S1P lgG).
Abstract: This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using signicantly more space per processor than that required for a single-processor execution. Utilizing a new graph-theoretic model of multithreaded computation, execution eciency is quantied by three important measures: T1 is the time required for executing the computation on a 1 processor, T1 is the time required by an innite number of processors, and S1 is the space required to execute the computation on a 1 processor. A computation executed on P processors is time-ecient if the time is O(T1=P + T1), that is, it achieves linear speedup when P = O(T1=T1), and it is space-ecient if it uses O(S1P) total space, that is, the space per processor is within a constant factor of that required for a 1-processor execution. The rst result derived from this model shows that there exist multithreaded computations such that no execution schedule can simultaneously achieve ecient time and ecient space. But by restricting attention to \strict" computations|those in which all arguments to a procedure must be available before the procedure can be invoked|much more positive results are obtainable. Specif- ically, for any strict multithreaded computation, a simple online algorithm can compute a schedule that is both time-ecient and space-ecient. Unfortunately, because the algorithm uses a global queue, the overhead of computing the schedule can be substantial. This problem is overcome by a decentralized algorithm that can compute and execute a P-processor schedule online in expected time O(T1=P + T1 lgP) and worst-case space O(S1P lgP), including overhead costs.

127 citations


Patent
08 Apr 1993
TL;DR: In this article, the interconnection network establishes a path in accordance with the address from the source processor in a downstream direction to the destination processors thereby to facilitate transfer of the message to destination processors.
Abstract: A parallel computer comprising a plurality of processors and an interconnection network for transferring messages among the processors. At least one of the processors, as a source processor, generates messages, each including an address defining a path through the interconnection network from the source processor to one or more of the processors which are to receive the message as destination processors. The interconnection network establishes, in response to a message from the source processor, a path in accordance with the address from the source processor in a downstream direction to the destination processors thereby to facilitate transfer of the message to the destination processors. Each destination processor generates response indicia in response to a message. The interconnection network receives the response indicia from the destination processor(s) and generates, in response, consolidated response indicia which it transfers in an upstream direction to the source processor.

73 citations


Proceedings ArticleDOI
03 Nov 1993
TL;DR: A general method that can save substantially on the I/O traffic for many problems in sparse linear relaxation problems in which each iteration of the algorithm updates the state of every vertex in a graph with a linear combination of the states of its neighbors.
Abstract: When a numerical computation fails to fit in the primary memory of a serial or parallel computer, a so-called "out-of-core" algorithm must be used which moves data between primary and secondary memories. In this paper, we study out-of-core algorithms for sparse linear relaxation problems in which each iteration of the algorithm updates the state of every vertex in a graph with a linear combination of the states of its neighbors. We give a general method that can save substantially on the I/O traffic for many problems. For example, our technique allows a computer with M words of primary memory to perform T=/spl Omega/(M/sup 1/5/) cycles of a multigrid algorithm for a two-dimensional elliptic solver over an n-point domain using only /spl Theta/(nT/M/sup 1/5/) I/O transfers, as compared with the naive algorithm which requires /spl Omega/(nT) I/O's. >

44 citations