scispace - formally typeset
Search or ask a question

Showing papers by "Charles E. Leiserson published in 1994"


Book
01 Jun 1994
TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.
Abstract: The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routing network is parameterized not only in the number of processors, but also in the amount of simultaneous communication it can support. Since communication can be scaled independently from the number of processors, substantial hardware can be saved for such applications as finite-element analysis without resorting to a special-purpose architecture. It is proved that a fat-tree of a given size is nearly the best routing network of that size. This universality theorem is established using a three-dimensional VLSI model that incorporates wiring as a direct cost. In this model, hardware size is measured as physical volume. It is proved that for any given amount of communications hardware, a fat-tree built from that amount of hardware can stimulate every other network built from the same amount of hardware, using only slightly more time (a polylogarithmic factor greater).

1,227 citations


Proceedings ArticleDOI
20 Nov 1994
TL;DR: This paper gives the first provably good work-stealing scheduler for multithreaded computations with dependencies, and shows that the expected time T/sub P/ to execute a fully strict computation on P processors using this work- Stealing Scheduler is T/ Sub P/=O(T/sub 1//P+T/ sub /spl infin//), where T/ sub 1/ is the minimum serial execution time of the multith readed computation and T/
Abstract: This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies. Specifically, our analysis shows that the expected time T/sub P/ to execute a fully strict computation on P processors using our work-stealing scheduler is T/sub P/=O(T/sub 1//P+T/sub /spl infin//), where T/sub 1/ is the minimum serial execution time of the multithreaded computation and T/sub /spl infin// is the minimum execution time with an infinite number of processors. Moreover, the space S/sub P/ required by the execution satisfies S/sub P//spl les/S/sub 1/P. We also show that the expected total communication of the algorithm is at most O(T/sub /spl infin//S/sub max/P), where S/sub max/ is the size of the largest activation record of any thread, thereby justifying the folk wisdom that work-stealing schedulers are more communication efficient than their work-sharing counterparts. All three of these bounds are existentially optimal to within a constant factor. >

660 citations


Book
01 Jun 1994
TL;DR: The Connection Machine Model CM-5 Supercomputer is a massively parallel computer system designed to offer performance in the range of 1 teraflops (1012 floating-point operations per second).

200 citations


Patent
14 Jan 1994
TL;DR: In this paper, a message generator performs an address translation operation in connection with the address data and the contents of the address translation table to generate updated address data which it uses data in relation with generating address information for the message.
Abstract: A digital computer comprising a plurality of message generating nodes interconnected by a routing network. The routing network transfers messages among the message generating elements in accordance with address information identifying a destination message generating element. Each message generating node includes a message data generator and a network interface. The message data generator generates message data items each including an address data portion comprising a destination identifier. The network interface includes a message generator and an address translation table, the table including a plurality of entries identifying, for at least one destination identifier, a translated destination identifier. The message generator, in response to the receipt of a message data item from the message data generator, generates a message for transmission to the routing network. In generating the message, the message generator performs an address translation operation in connection with the address data and the contents of the address translation table to generate updated address data which it uses data in connection with generating address information for the message.

39 citations


Book
01 Jun 1994
TL;DR: In this article, the problem of efficiently permuting data stored in VLSI chips in accordance with a predetermined set of permutations is explored, and it is shown that the number of pins per chip can often be reduced.
Abstract: The problem of efficiently permuting data stored in VLSI chips in accordance with a predetermined set of permutations is explored. By connecting chips with shared bus interconnections, as opposed to point-to-point interconnections, it is shown that the number of pins per chip can often be reduced. As an example, for infinitely many n, the authors exhibit permutation architectures that can realize any of the n cyclic shifts on n chips in one clock tick, where the upper limit on the number of pins per chip is the greatest integer >

30 citations



Patent
14 Jan 1994
TL;DR: In this article, a tree-structure consisting of a plurality of processing nodes, a control node and a request distribution network is proposed, where the control node generates processing requests for transfer to selected ones of the processing nodes as identified by associated request address information and receives processed data in response.
Abstract: A computer comprising a plurality of processing nodes, a control node and a request distribution network. Each processing node receives processing requests and generates in response processed data. The control node generates processing requests for transfer to selected ones of the processing nodes as identified by associated request address information, and receives processed data in response, the request address information identifying selected ones of the processing nodes to receive a processing request in parallel. The request distribution network distributes the processing requests to the processing nodes and returns processed data to the control node. The network includes a plurality of request distribution nodes connected in a plurality of levels to form a tree-structure, including an upper root level and a lower leaf level. Each request distribution node is connected to receive processing requests from, and to couple processed data to, a parent, the parent of the request distribution node of the root level comprising the control node, and each request distribution node being further connected to couple processing requests to and receive processed data from, selected children, the children of the request distribution nodes of the leaf level comprising the processing nodes. Each request distribution node, in response to request address information received from its parent, identifies selected ones of its children and thereafter couples further request address information which it receives and processing requests in parallel to its children.

21 citations


Patent
14 Jan 1994
TL;DR: In this article, the routing network comprises a plurality of interconnected router nodes, at least some of said router nodes being connected to the processors to receive messages therefrom and transmit messages thereto.
Abstract: A computer including a processor array and a routing network. Processors in the processor array generate messages for transfer to over the routing network, each message including a path identifier portion identifying a path from a source, message processor to a destination processor. The routing network comprises a plurality of interconnected router nodes, at least some of said router nodes being connected to the processors to receive messages therefrom and transmit messages thereto. Each router node operates in a plurality of modes. In a first mode, the router nodes couple received messages to a router node connected thereto in accordance with the path identifier portion to thereby transfer each respective message along the path identified in its path identifier portion. In a second mode, the router node couple received messages to predetermined ones of the router nodes or processors connected thereto, the predetermined ones of said router nodes or processors being selected to facilitate transfer of a message to a nearby processor to facilitate the rapid emptying of the routing network of messages. A control element controls the router nodes to enable them to operate in the first mode or second mode generally contemporaneously.

14 citations