scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A scheme for fast parallel communication

01 May 1982-SIAM Journal on Computing (Society for Industrial and Applied Mathematics)-Vol. 11, Iss: 2, pp 350-361
TL;DR: There is a distributed randomized algorithm that can route every packet to its destination without two packets passing down the same wire at any one time, and finishes within time $O(\log N)$ with overwhelming probability for all such routing requests.
Abstract: Consider $N = 2^n $ nodes connected by wires to make an n-dimensional binary cube. Suppose that initially the nodes contain one packet each addressed to distinct nodes of the cube. We show that the...
Citations
More filters
Book
01 Jan 1995
TL;DR: This book introduces the basic concepts in the design and analysis of randomized algorithms and presents basic tools such as probability theory and probabilistic analysis that are frequently used in algorithmic applications.
Abstract: For many applications, a randomized algorithm is either the simplest or the fastest algorithm available, and sometimes both. This book introduces the basic concepts in the design and analysis of randomized algorithms. The first part of the text presents basic tools such as probability theory and probabilistic analysis that are frequently used in algorithmic applications. Algorithmic examples are also given to illustrate the use of each tool in a concrete setting. In the second part of the book, each chapter focuses on an important area to which randomized algorithms can be applied, providing a comprehensive and representative selection of the algorithms that might be used in each of these areas. Although written primarily as a text for advanced undergraduates and graduate students, this book should also prove invaluable as a reference for professionals and researchers.

4,412 citations


Cites methods from "A scheme for fast parallel communic..."

  • ...The power of randomization in solving the permutation routing problem was first demonstrated by Valiant [403]; his analysis was subsequently simplified by Valiant and Brebner [400]....

    [...]

Journal ArticleDOI
TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Abstract: The success of the von Neumann model of sequential computation is attributable to the fact that it is an efficient bridge between software and hardware: high-level languages can be efficiently compiled on to this model; yet it can be effeciently implemented in hardware. The author argues that an analogous bridge between software and hardware in required for parallel computation if that is to become as widely used. This article introduces the bulk-synchronous parallel (BSP) model as a candidate for this role, and gives results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

3,885 citations

Journal ArticleDOI
TL;DR: Information Dispersal Algorithm (IDA) has numerous applications to secure and reliable storage of information in computer networks and even on single disks, to fault-tolerant and efficient transmission ofInformation in networks, and to communications between processors in parallel computers.
Abstract: An Information Dispersal Algorithm (IDA) is developed that breaks a file F of length L = u Fu into n pieces Fi, l ≤ i ≤ n, each of length uFiu = L/m, so that every m pieces suffice for reconstructing F. Dispersal and reconstruction are computationally efficient. The sum of the lengths uFiu is (n/m) · L. Since n/m can be chosen to be close to l, the IDA is space efficient. IDA has numerous applications to secure and reliable storage of information in computer networks and even on single disks, to fault-tolerant and efficient transmission of information in networks, and to communications between processors in parallel computers. For the latter problem provably time-efficient and highly fault-tolerant routing on the n-cube is achieved, using just constant size buffers.

2,479 citations

Book
01 Jun 1994
TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.
Abstract: The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routing network is parameterized not only in the number of processors, but also in the amount of simultaneous communication it can support. Since communication can be scaled independently from the number of processors, substantial hardware can be saved for such applications as finite-element analysis without resorting to a special-purpose architecture. It is proved that a fat-tree of a given size is nearly the best routing network of that size. This universality theorem is established using a three-dimensional VLSI model that incorporates wiring as a direct cost. In this model, hardware size is measured as physical volume. It is proved that for any given amount of communications hardware, a fat-tree built from that amount of hardware can stimulate every other network built from the same amount of hardware, using only slightly more time (a polylogarithmic factor greater).

1,227 citations

Journal ArticleDOI
TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.
Abstract: The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routing network is parameterized not only in the number of processors, but also in the amount of simultaneous communication it can support. Since communication can be scaled independently from the number of processors, substantial hardware can be saved for such applications as finite-element analysis without resorting to a special-purpose architecture. It is proved that a fat-tree of a given size is nearly the best routing network of that size. This universality theorem is established using a three-dimensional VLSI model that incorporates wiring as a direct cost. In this model, hardware size is measured as physical volume. It is proved that for any given amount of communications hardware, a fat-tree built from that amount of hardware can stimulate every other network built from the same amount of hardware, using only slightly more time (a polylogarithmic factor greater).

1,147 citations

References
More filters
Proceedings ArticleDOI
30 Apr 1968
TL;DR: To achieve high throughput rates today's computers perform several operations simultaneously; not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently.
Abstract: To achieve high throughput rates today's computers perform several operations simultaneously. Not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently. A major problem in the design of such a computing system is the connecting together of the various parts of the system (the I/O devices, memories, processing units, etc.) in such a way that all the required data transfers can be accommodated. One common scheme is a high-speed bus which is time-shared by the various parts; speed of available hardware limits this scheme. Another scheme is a cross-bar switch or matrix; limiting factors here are the amount of hardware (an m × n matrix requires m × n cross-points) and the fan-in and fan-out of the hardware.

2,553 citations

Proceedings ArticleDOI
11 May 1981
TL;DR: This paper shows that there exists an N-processor computer that can simulate arbitrary N- processor parallel computations with only a factor of O(log N) loss of runtime efficiency, and isolates a combinatorial problem that lies at the heart of this question.
Abstract: In this paper we isolate a combinatorial problem that, we believe, lies at the heart of this question and provide some encouragingly positive solutions to it. We show that there exists an N-processor realistic computer that can simulate arbitrary idealistic N-processor parallel computations with only a factor of O(log N) loss of runtime efficiency. The main innovation is an O(log N) time randomized routing algorithm. Previous approaches were based on sorting or permutation networks, and implied loss factors of order at least (log N)2.

694 citations

Journal ArticleDOI
TL;DR: The construction of a switching network capable of n-permutation of its input terminals to its output terminals is described and an algorithm is given for the setting of the binary cells in the network according to any specified permutation.
Abstract: In this paper the construction of a switching network capable of n!-permutation of its n input terminals to its n output terminals is described. The building blocks for this network are binary cells capable of permuting their two input terminals to their two output terminals.The number of cells used by the network is 〈n · log2n - n + 1〉 = Σnk=1 〈log2k〉. It could be argued that for such a network this number of cells is a lower bound, by noting that binary decision trees in the network can resolve individual terminal assignments only and not the partitioning of the permutation set itself which requires only 〈log2n!〉 = 〈Σnk=1 log2k〉 binary decisions.An algorithm is also given for the setting of the binary cells in the network according to any specified permutation.

488 citations

Book ChapterDOI
TL;DR: In this paper, the authors considered the problem of finding the maximum and the minimum of the expected value of a real-valued function of a function g(S) when ES = np, and showed that the variability in the number of successes is highest when the successes are equally probable.
Abstract: Let S be the number of successes in n independent trials, and let p i denote the probability of success in the jth trial, j = 1, 2, …, n (Poisson trials). We consider the problem of finding the maximum and the minimum of Eg(S), the expected value of a given real-valued function of S, when ES = np is fixed. It is well known that the maximum of the variance of S is attained when p 1 = p 2 = … = p n = p This can be interpreted as showing that the variability in the number of successes is highest when the successes are equally probable (Bernoulli trials). This interpretation is further supported by the following two theorems, proved in this paper. If b and c are two integers, 0 ≦,b≦np≦c≦n, the probability P(b ≦S ≦ c) attains its minimum if and only if p 1 = p 2 = … = p n = p, unless b = 0 and c = n (Theorem 5, a corollary of Theorem 4, which gives the maximum and the minimum of P(S ≦ cc)). If g is a strictly convex function, Eg(S) attains its maximum if and only if p 1 = p 2 = … = p n = p (Theorem 3). These results are obtained with the help of two theorems concerning the extrema of the expected value of an arbitrary function g(S) under the condition ES = np. Theorem 1 gives necessary conditions for the maximum and the minimum of Eg(S). Theorem 2 gives a partial characterization of the set of points at which an extremum is attained. Corollary 2.1 states that the maximum and the minimum are attained when p 1, p 2, …, p n take on, at most, three different values, only one of which is distinct from 0 and 1. Applications of Theorems 3 and 5 to problems of estimation and testing are pointed out in Section 5.

377 citations

Journal ArticleDOI
TL;DR: An algorithm is given for routing in permutation networks-that is, for computing the switch settings that implement a given permutation.
Abstract: An algorithm is given for routing in permutation networks-that is, for computing the switch settings that implement a given permutation. The algorithm takes serial time O(n(log N)2) (for one processor with random access to a memory of O(n) words) or parallel time O((log n)3) (for n synchronous processors with conflict-free random access to a common memory of O(n) words). These time bounds may be reduced by a further logarithmic factor when all of the switch sizes are integral powers of two.

282 citations