A scheme for fast parallel communication

doi:10.1137/0211027

Home
/
Papers
/
A scheme for fast parallel communication

Journal Article•DOI•

A scheme for fast parallel communication

01 May 1982-SIAM Journal on Computing (Society for Industrial and Applied Mathematics)-Vol. 11, Iss: 2, pp 350-361

TL;DR: There is a distributed randomized algorithm that can route every packet to its destination without two packets passing down the same wire at any one time, and finishes within time $O(\log N)$ with overwhelming probability for all such routing requests.

read less

Abstract: Consider $N = 2^n $ nodes connected by wires to make an n-dimensional binary cube. Suppose that initially the nodes contain one packet each addressed to distinct nodes of the cube. We show that the...

...read moreread less

Citations

PDF

Open Access

More filters

Book•

Randomized Algorithms

[...]

Rajeev Motwani¹, Prabhakar Raghavan²•Institutions (2)

Stanford University¹, IBM²

01 Jan 1995

TL;DR: This book introduces the basic concepts in the design and analysis of randomized algorithms and presents basic tools such as probability theory and probabilistic analysis that are frequently used in algorithmic applications.

...read moreread less

Abstract: For many applications, a randomized algorithm is either the simplest or the fastest algorithm available, and sometimes both. This book introduces the basic concepts in the design and analysis of randomized algorithms. The first part of the text presents basic tools such as probability theory and probabilistic analysis that are frequently used in algorithmic applications. Algorithmic examples are also given to illustrate the use of each tool in a concrete setting. In the second part of the book, each chapter focuses on an important area to which randomized algorithms can be applied, providing a comprehensive and representative selection of the algorithms that might be used in each of these areas. Although written primarily as a text for advanced undergraduates and graduate students, this book should also prove invaluable as a reference for professionals and researchers.

...read moreread less

4,412 citations

Cites methods from "A scheme for fast parallel communic..."

...The power of randomization in solving the permutation routing problem was first demonstrated by Valiant [403]; his analysis was subsequently simplified by Valiant and Brebner [400]....
[...]

Journal Article•DOI•

A bridging model for parallel computation

[...]

Leslie G. Valiant¹•Institutions (1)

Harvard University¹

01 Aug 1990-Communications of The ACM

TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

...read moreread less

Abstract: The success of the von Neumann model of sequential computation is attributable to the fact that it is an efficient bridge between software and hardware: high-level languages can be efficiently compiled on to this model; yet it can be effeciently implemented in hardware. The author argues that an analogous bridge between software and hardware in required for parallel computation if that is to become as widely used. This article introduces the bulk-synchronous parallel (BSP) model as a candidate for this role, and gives results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

...read moreread less

3,885 citations

Journal Article•DOI•

Efficient dispersal of information for security, load balancing, and fault tolerance

[...]

Michael O. Rabin¹•Institutions (1)

Harvard University¹

01 Apr 1989-Journal of the ACM

TL;DR: Information Dispersal Algorithm (IDA) has numerous applications to secure and reliable storage of information in computer networks and even on single disks, to fault-tolerant and efficient transmission ofInformation in networks, and to communications between processors in parallel computers.

...read moreread less

Abstract: An Information Dispersal Algorithm (IDA) is developed that breaks a file F of length L = u Fu into n pieces Fi, l ≤ i ≤ n, each of length uFiu = L/m, so that every m pieces suffice for reconstructing F. Dispersal and reconstruction are computationally efficient. The sum of the lengths uFiu is (n/m) · L. Since n/m can be chosen to be close to l, the IDA is space efficient. IDA has numerous applications to secure and reliable storage of information in computer networks and even on single disks, to fault-tolerant and efficient transmission of information in networks, and to communications between processors in parallel computers. For the latter problem provably time-efficient and highly fault-tolerant routing on the n-cube is achieved, using just constant size buffers.

...read moreread less

2,479 citations

Book•

Fat-trees: universal networks for hardware-efficient supercomputing

[...]

Charles E. Leiserson

01 Jun 1994

TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.

...read moreread less

Abstract: The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routing network is parameterized not only in the number of processors, but also in the amount of simultaneous communication it can support. Since communication can be scaled independently from the number of processors, substantial hardware can be saved for such applications as finite-element analysis without resorting to a special-purpose architecture. It is proved that a fat-tree of a given size is nearly the best routing network of that size. This universality theorem is established using a three-dimensional VLSI model that incorporates wiring as a direct cost. In this model, hardware size is measured as physical volume. It is proved that for any given amount of communications hardware, a fat-tree built from that amount of hardware can stimulate every other network built from the same amount of hardware, using only slightly more time (a polylogarithmic factor greater).

...read moreread less

1,227 citations

Journal Article•DOI•

Fat-trees: Universal networks for hardware-efficient supercomputing

[...]

Charles E. Leiserson¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Oct 1985-IEEE Transactions on Computers

...read moreread less

1,147 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Sorting networks and their applications

[...]

Kenneth E. Batcher¹•Institutions (1)

Goodyear Aerospace¹

30 Apr 1968

TL;DR: To achieve high throughput rates today's computers perform several operations simultaneously; not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently.

...read moreread less

Abstract: To achieve high throughput rates today's computers perform several operations simultaneously. Not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently. A major problem in the design of such a computing system is the connecting together of the various parts of the system (the I/O devices, memories, processing units, etc.) in such a way that all the required data transfers can be accommodated. One common scheme is a high-speed bus which is time-shared by the various parts; speed of available hardware limits this scheme. Another scheme is a cross-bar switch or matrix; limiting factors here are the amount of hardware (an m × n matrix requires m × n cross-points) and the fan-in and fan-out of the hardware.

...read moreread less

2,553 citations

Proceedings Article•DOI•

Universal schemes for parallel communication

[...]

Leslie G. Valiant, G. J. Brebner

11 May 1981

TL;DR: This paper shows that there exists an N-processor computer that can simulate arbitrary N- processor parallel computations with only a factor of O(log N) loss of runtime efficiency, and isolates a combinatorial problem that lies at the heart of this question.

...read moreread less

Abstract: In this paper we isolate a combinatorial problem that, we believe, lies at the heart of this question and provide some encouragingly positive solutions to it. We show that there exists an N-processor realistic computer that can simulate arbitrary idealistic N-processor parallel computations with only a factor of O(log N) loss of runtime efficiency. The main innovation is an O(log N) time randomized routing algorithm. Previous approaches were based on sorting or permutation networks, and implied loss factors of order at least (log N)2.

...read moreread less

694 citations

Journal Article•DOI•

A Permutation Network

[...]

Abraham Waksman

01 Jan 1968-Journal of the ACM

TL;DR: The construction of a switching network capable of n-permutation of its input terminals to its output terminals is described and an algorithm is given for the setting of the binary cells in the network according to any specified permutation.

...read moreread less

Abstract: In this paper the construction of a switching network capable of n!-permutation of its n input terminals to its n output terminals is described. The building blocks for this network are binary cells capable of permuting their two input terminals to their two output terminals.The number of cells used by the network is 〈n · log2n - n + 1〉 = Σnk=1 〈log2k〉. It could be argued that for such a network this number of cells is a lower bound, by noting that binary decision trees in the network can resolve individual terminal assignments only and not the partitioning of the permutation set itself which requires only 〈log2n!〉 = 〈Σnk=1 log2k〉 binary decisions.An algorithm is also given for the setting of the binary cells in the network according to any specified permutation.

...read moreread less

488 citations

Book Chapter•DOI•

On the Distribution of the Number of Successes in Independent Trials

[...]

Wassily Hoeffding¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Sep 1956-Annals of Mathematical Statistics

TL;DR: In this paper, the authors considered the problem of finding the maximum and the minimum of the expected value of a real-valued function of a function g(S) when ES = np, and showed that the variability in the number of successes is highest when the successes are equally probable.

...read moreread less

Abstract: Let S be the number of successes in n independent trials, and let p i denote the probability of success in the jth trial, j = 1, 2, …, n (Poisson trials). We consider the problem of finding the maximum and the minimum of Eg(S), the expected value of a given real-valued function of S, when ES = np is fixed. It is well known that the maximum of the variance of S is attained when p 1 = p 2 = … = p n = p This can be interpreted as showing that the variability in the number of successes is highest when the successes are equally probable (Bernoulli trials). This interpretation is further supported by the following two theorems, proved in this paper. If b and c are two integers, 0 ≦,b≦np≦c≦n, the probability P(b ≦S ≦ c) attains its minimum if and only if p 1 = p 2 = … = p n = p, unless b = 0 and c = n (Theorem 5, a corollary of Theorem 4, which gives the maximum and the minimum of P(S ≦ cc)). If g is a strictly convex function, Eg(S) attains its maximum if and only if p 1 = p 2 = … = p n = p (Theorem 3). These results are obtained with the help of two theorems concerning the extrema of the expected value of an arbitrary function g(S) under the condition ES = np. Theorem 1 gives necessary conditions for the maximum and the minimum of Eg(S). Theorem 2 gives a partial characterization of the set of points at which an extremum is attained. Corollary 2.1 states that the maximum and the minimum are attained when p 1, p 2, …, p n take on, at most, three different values, only one of which is distinct from 0 and 1. Applications of Theorems 3 and 5 to problems of estimation and testing are pointed out in Section 5.

...read moreread less

377 citations

Journal Article•DOI•

A fast parallel algorithm for routing in permutation networks

[...]

G. F. Lev¹, Leslie G. Valiant¹, Nicholas Pippenger²•Institutions (2)

University of Edinburgh¹, IBM²

01 Feb 1981-IEEE Transactions on Computers

TL;DR: An algorithm is given for routing in permutation networks-that is, for computing the switch settings that implement a given permutation.

...read moreread less

Abstract: An algorithm is given for routing in permutation networks-that is, for computing the switch settings that implement a given permutation. The algorithm takes serial time O(n(log N)2) (for one processor with random access to a memory of O(n) words) or parallel time O((log n)3) (for n synchronous processors with conflict-free random access to a common memory of O(n) words). These time bounds may be reduced by a further logarithmic factor when all of the switch sizes are integral powers of two.

...read moreread less

282 citations