An introduction to parallel algorithms

Home
/
Papers
/
An introduction to parallel algorithms

Book•

An introduction to parallel algorithms

Joseph JaJa¹•Institutions (1)

01 Oct 1992-

TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.

read less

Abstract: Written by an authority in the field, this book provides an introduction to the design and analysis of parallel algorithms. The emphasis is on the application of the PRAM (parallel random access machine) model of parallel computation, with all its variants, to algorithm analysis. Special attention is given to the selection of relevant data structures and to algorithm design principles that have proved to be useful. Features *Uses PRAM (parallel random access machine) as the model for parallel computation. *Covers all essential classes of parallel algorithms. *Rich exercise sets. *Written by a highly respected author within the field. 0201548569B04062001

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Reconstructing a Minimum Spanning Tree after Deletion of Any Node

[...]

Bevan Narayan Das¹, Michael C. Loui²•Institutions (2)

Alcatel-Lucent¹, University of Illinois at Urbana–Champaign²

01 Dec 2001-Algorithmica

TL;DR: This paper considers single node deletions in MSTs, and presents a sequential algorithm and a parallel algorithm that find the minimum weight set of edges R(v) for all V simultaneously.

...read moreread less

Abstract: . Updating a minimum spanning tree (MST) is a basic problem for communication networks. In this paper we consider single node deletions in MSTs. Let G=(V,E) be an undirected graph with n nodes and m edges, and let T be the MST of G . For each node v in V , the node replacement for v is the minimum weight set of edges R(v) that connect the components of T-v . We present a sequential algorithm and a parallel algorithm that find R(v) for all V simultaneously. The sequential algorithm takes O(m log n) time, but only O(m α (m,n)) time when the edges of E are presorted by weight. The parallel algorithm takes O(log 2 n) time using m processors on a CREW PRAM.

...read moreread less

25 citations

Cites methods from "An introduction to parallel algorit..."

...To evaluate the expression of hv.w/ for each supernode, Sollin’s algorithm, as implemented in say [ 22 ], uses a rake-and-compress algorithm [23]....
[...]
...FNRP uses a parallel prefix update algorithm [ 22 ] to determine rep kC1 .v 0 / and to update that value to all processors involved in the computation....
[...]
...O.log n/ iterations using n 2 processors on a CREW PRAM [ 22 ]....
[...]
...In total, for all supernodes in iteration k, this parallel prefix update algorithm takes O.log n/ time using n processors on a CREW PRAM [ 22 ]....
[...]

Book Chapter•DOI•

Randomized Parallel List Ranking for Distributed Memory Multiprocessors

[...]

Frank Dehne¹, Siang Wun Song²•Institutions (2)

Carleton University¹, University of São Paulo²

02 Dec 1996

TL;DR: A randomized parallel list ranking algorithm for distributed memory multiprocessors that is bounded, with high probability, by 118 and conjecture that the actual number of communications rounds will not exceed 50.

...read moreread less

Abstract: We present a randomized parallel list ranking algorithm for distributed memory multiprocessors. A simple version requires, with high probability, log(3p)+log ln(n)=O(log p+log log n) communication rounds (h-relations with h=O(n/p)) and O(n/p) local computation. An improved version requires, with high probability, only r ≤ (4k+6) log (2/3p)+8=O(k log p) communication rounds where k= min{i ≥ 0¦ln(i+1)n ≤ (2/3p)2i+1}. Note that k < ln*(n) is an extremely small number. For \(n \leqslant 10^{10^{100} }\)and p ≥ 4, the value of k is at most 2. For a given number of processors, p, the number of communication rounds required is, for all practical purposes, independent of n. For \(n \leqslant 10^{10^{100} }\)and 4 ≤ p ≤ 2048, the number of communication rounds in our algorithm is bounded, with high probability, by 118. We conjecture that the actual number of communications rounds will not exceed 50.

...read moreread less

25 citations

Book Chapter•DOI•

Computable Quantifiers and Logics over Finite Structures

[...]

Johann A. Makowsky¹, Yachin Pnueli¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Jan 1995

TL;DR: The notion of computable quantifiers over finite structures is explored and used to give a unified treatment of the theory of computability queries in databases and logics capturing complexity classes.

...read moreread less

Abstract: We explore the notion of computable quantifiers over finite structures and use them to give a unified treatment of the theory of computable queries in databases and logics capturing complexity classes. We use this framework also to discuss generalized Ehrenfeucht—Fraisse games and their applications to complexity theory.

...read moreread less

25 citations

Proceedings Article•DOI•

Nearly work-efficient parallel algorithm for digraph reachability

[...]

Jeremy T. Fineman¹•Institutions (1)

Georgetown University¹

20 Jun 2018

TL;DR: A surprisingly simple sequential algorithm that achieves the stated diameter reduction and runs in Õ(m) time and can be extended to produce a directed spanning tree, determine whether the graph is acyclic, topologically sort the strongly connected components of the graph, or produce adirected ear decomposition.

...read moreread less

Abstract: One of the simplest problems on directed graphs is that of identifying the set of vertices reachable from a designated source vertex. This problem can be solved easily sequentially by performing a graph search, but efficient parallel algorithms have eluded researchers for decades. For sparse high-diameter graphs in particular, there is no known work-efficient parallel algorithm with nontrivial parallelism. This amounts to one of the most fundamental open questions in parallel graph algorithms: Is there a parallel algorithm for digraph reachability with nearly linear work? This paper shows that the answer is yes. This paper presents a randomized parallel algorithm for digraph reachability and related problems with expected work O(m) and span O(n2/3), and hence parallelism Ω(m/n2/3) = Ω(n1/3), on any graph with n vertices and m arcs. This is the first parallel algorithm having both nearly linear work and strongly sublinear span, i.e., span O(n1−є) for any constant є>0. The algorithm can be extended to produce a directed spanning tree, determine whether the graph is acyclic, topologically sort the strongly connected components of the graph, or produce a directed ear decomposition, all with work O(m) and span O(n2/3). The main technical contribution is an efficient Monte Carlo algorithm that, through the addition of O(n) shortcuts, reduces the diameter of the graph to O(n2/3) with high probability. While both sequential and parallel algorithms are known with those combinatorial properties, even the sequential algorithms are not efficient, having sequential runtime Ω(mnΩ(1)). This paper presents a surprisingly simple sequential algorithm that achieves the stated diameter reduction and runs in O(m) time. Parallelizing that algorithm yields the main result, but doing so involves overcoming several other challenges.

...read moreread less

25 citations

Journal Article•DOI•

Parallel computation of the Euclidean distance transform on a three-dimensional image array

[...]

Yu-Hua Lee, Shi-Jinn Horng¹, J. Seltzer•Institutions (1)

National Taiwan University of Science and Technology¹

01 Mar 2003-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This work develops a parallel algorithm for the three-dimensional Euclidean distance transform (3D-EDT) on the EREW PRAM computation model and implements the proposed algorithms sequentially, the performance of which exceeds the existing algorithms.

...read moreread less

Abstract: In a two- or three-dimensional image array, the computation of Euclidean distance transform (EDT) is an important task. With the increasing application of 3D voxel images, it is useful to consider the distance transform of a 3D digital image array. Because the EDT computation is a global operation, it is prohibitively time consuming when performing the EDT for image processing. In order to provide the efficient transform computations, parallelism is employed. We first derive several important geometry relations and properties among parallel planes. We then, develop a parallel algorithm for the three-dimensional Euclidean distance transform (3D-EDT) on the EREW PRAM computation model. The time complexity of our parallel algorithm is O(log/sup 2/ N) for an N/spl times/N/spl times/N image array and this is currently the best known result. A generalized parallel algorithm for the 3D-EDT is also proposed. We implement the proposed algorithms sequentially, the performance of which exceeds the existing algorithms (proposed by Yamada, 1984). Finally, we develop the corresponding parallel programs on both the emulated EREW PRAM model computer and the IBM SP2 to verify the speed-up properties of the proposed algorithms.

...read moreread less

25 citations

Cites background from "An introduction to parallel algorit..."

...The interested reader can refer to [1], [2], [17] for further discussion on the massively parallel processing models, architectures, and algorithms....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
…
57
58
59
60
61
62
63
…
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes

[...]

F. Thomson Leighton

01 Sep 1991

TL;DR: This chapter discusses sorting on a Linear Array with a Systolic and Semisystolic Model of Computation, which automates the very labor-intensive and therefore time-heavy and expensive process of manually sorting arrays.

...read moreread less

Abstract: Preface Acknowledgments Notation 1 Arrays and Trees 1.1 Elementary Sorting and Counting 1.1.1 Sorting on a Linear Array Assessing the Performance of the Algorithm Sorting N Numbers with Fewer Than N Processors 1.1.2 Sorting in the Bit Model 1.1.3 Lower Bounds 1.1.4 A Counterexample-Counting 1.1.5 Properties of the Fixed-Connection Network Model 1.2 Integer Arithmetic 1.2.1 Carry-Lookahead Addition 1.2.2 Prefix Computations-Segmented Prefix Computations 1.2.3 Carry-Save Addition 1.2.4 Multiplication and Convolution 1.2.5 Division and Newton Iteration 1.3 Matrix Algorithms 1.3.1 Elementary Matrix Products 1.3.2 Algorithms for Triangular Matrices 1.3.3 Algorithms for Tridiagonal Matrices -Odd-Even Reduction -Parallel Prefix Algorithms 1.3.4 Gaussian Elimination 1.3.5 Iterative Methods -Jacobi Relaxation -Gauss-Seidel Relaxation Finite Difference Methods -Multigrid Methods 1.4 Retiming and Systolic Conversion 1.4.1 A Motivating Example-Palindrome Recognition 1.4.2 The Systolic and Semisystolic Model of Computation 1.4.3 Retiming Semisystolic Networks 1.4.4 Conversion of a Semisystolic Network into a Systolic Network 1.4.5 The Special Case of Broadcasting 1.4.6 Retiming the Host 1.4.7 Design by Systolic Conversion-A Summary 1.5 Graph Algorithms 1.5.1 Transitive Closure 1.5.2 Connected Components 1.5.3 Shortest Paths 1.5.4 Breadth-First Spanning Trees 1.5.5 Minimum Weight Spanning Trees 1.6 Sorting Revisited 1.6.1 Odd-Even Transposition Sort on a Linear Array 1.6.2 A Simple Root-N(log N + 1)-Step Sorting Algorithm 1.6.3 A (3 Root- N + o(Root-N))-Step Sorting Algorithm 1.6.4 A Matching Lower Bound 1.7 Packet Routing 1.7.1 Greedy Algorithms 1.7.2 Average-Case Analysis of Greedy Algorithms -Routing N Packets to Random Destinations -Analysis of Dynamic Routing Problems 1.7.3 Randomized Routing Algorithms 1.7.4 Deterministic Algorithms with Small Queues 1.7.5 An Off-line Algorithm 1.7.6 Other Routing Models and Algorithms 1.8 Image Analysis and Computational Geometry 1.8.1 Component-Labelling Algorithms -Levialdi's Algorithm -An O (Root-N)-Step Recursive Algorithm 1.8.2 Computing Hough Transforms 1.8.3 Nearest-Neighbor Algorithms 1.8.4 Finding Convex Hulls 1.9 Higher-Dimensional Arrays 1.9.1 Definitions and Properties 1.9.2 Matrix Multiplication 1.9.3 Sorting 1.9.4 Packet Routing 1.9.5 Simulating High-Dimensional Arrays on Low-Dimensional Arrays 1.10 problems 1.11 Bibliographic Notes 2 Meshes of Trees 2.1 The Two-Dimensional Mesh of Trees 2.1.1 Definition and Properties 2.1.2 Recursive Decomposition 2.1.3 Derivation from KN,N 2.1.4 Variations 2.1.5 Comparison With the Pyramid and Multigrid 2.2 Elementary O(log N)-Step Algorithms 2.2.1 Routing 2.2.2 Sorting 2.2.3 Matrix-Vector Multiplication 2.2.4 Jacobi Relaxation 2.2.5 Pivoting 2.2.6 Convolution 2.2.7 Convex Hull 2.3 Integer Arithmetic 2.3.1 Multiplication 2.3.2 Division and Chinese Remaindering 2.3.3 Related Problems -Iterated Products -Rooting Finding 2.4 Matrix Algorithms 2.4.1 The Three-Dimensional Mesh of Trees 2.4.2 Matrix Multiplication 2.4.3 Inverting Lower Triangular Matrices 2.4.4 Inverting Arbitrary Matrices -Csanky's Algorithm -Inversion by Newton Iteration 2.4.5 Related Problems 2.5 Graph Algorithms 2.5.1 Minimum-Weight Spanning Trees 2.5.2 Connected Components 2.5.3 Transitive Closure 2.5.4 Shortest Paths 2.5.5 Matching Problems 2.6 Fast Evaluation of Straight-Line Code 2.6.1 Addition and Multiplication Over a Semiring 2.6.2 Extension to Codes with Subtraction and Division 2.6.3 Applications 2.7 Higher-Dimensional meshes of Trees 2.7.1 Definitions and Properties 2.7.2 The Shuffle-Tree Graph 2.8 Problems 2.9 Bibliographic Notes 3 Hypercubes and Related Networks 3.1 The Hypercube 3.1.1 Definitions and Properties 3.1.2 Containment of Arrays -Higher-Dimensional Arrays -Non-Power-of-2 Arrays 3.1.3 Containment of Complete Binary Trees 3.1.4 Embeddings of Arbitrary Binary Trees -Embeddings with Dilation 1 and Load O(M over N + log N) -Embeddings with Dilation O(1) and Load O (M over N + 1) -A Review of One-Error-Correcting Codes -Embedding Plog N into Hlog N 3.1.5 Containment of Meshes of Trees 3.1.6 Other Containment Results 3.2 The Butterfly, Cube-Connected-Cycles , and Benes Network 3.2.1 Definitions and Properties 3.2.2 Simulation of Arbitrary Networks 3.2.3 Simulation of Normal Hypercube Algorithms 3.2.4 Some Containment and Simulation Results 3.3 The Shuffle-Exchange and de Bruijn Graphs 3.3.1 Definitions and Properties 3.3.2 The Diaconis Card Tricks 3.3.3 Simulation of Normal Hypercube Algorithms 3.3.4 Similarities with the Butterfly 3.3.5 Some Containment and Simulation Results 3.4 Packet-Routing Algorithms 3.4.1 Definitions and Routing Models 3.4.2 Greedy Routing Algorithms and Worst-Case Problems 3.4.3 Packing, Spreading, and Monotone Routing Problems -Reducing a Many-to-Many Routing Problem to a Many-to-One Routing Problem -Reducing a Routing Problem to a Sorting Problem 3.4.4 The Average-Case Behavior of the Greedy Algorithm -Bounds on Congestion -Bounds on Running Time -Analyzing Non-Predictive Contention-Resolution Protocols 3.4.5 Converting Worst-Case Routing Problems into Average-Case Routing Problems -Hashing -Randomized Routing 3.4.6 Bounding Queue Sizes -Routing on Arbitrary Levelled Networks 3.4.7 Routing with Combining 3.4.8 The Information Dispersal Approach to Routing -Using Information Dispersal to Attain Fault-Tolerance -Finite Fields and Coding Theory 3.4.9 Circuit-Switching Algorithms 3.5 Sorting 3.5.1 Odd-Even Merge Sort -Constructing a Sorting Circuit with Depth log N(log N +1)/2 3.5.2 Sorting Small Sets 3.5.3 A Deterministic O(log N log log N)-Step Sorting Algorithm 3.5.4 Randomized O(log N)-Step Sorting Algorithms -A Circuit with Depth 7.45 log N that Usually Sorts 3.6 Simulating a Parallel Random Access Machine 3.6.1 PRAM Models and Shared Memories 3.6.2 Randomized Simulations Based on Hashing 3.6.3 Deterministic Simulations using Replicated Data 3.6.4 Using Information Dispersal to Improve Performance 3.7 The Fast Fourier Transform 3.7.1 The Algorithm 3.7.2 Implementation on the Butterfly and Shuffle-Exchange Graph 3.7.3 Application to Convolution and Polynomial Arithmetic 3.7.4 Application to Integer Multiplication 3.8 Other Hypercubic Networks 3.8.1 Butterflylike Networks -The Omega Network -The Flip Network -The Baseline and Reverse Baseline Networks -Banyan and Delta Networks -k-ary Butterflies 3.8.2 De Bruijn-Type Networks -The k-ary de Bruijn Graph -The Generalized Shuffle-Exchange Graph 3.9 Problems 3.10 Bibliographic Notes Bibliography Index Lemmas, Theorems, and Corollaries Author Index Subject Index

...read moreread less

2,895 citations

"An introduction to parallel algorit..." refers background in this paper

...Multiprocessorbased computers have been around for decades and various types of computer architectures [2] have been implemented in hardware throughout the years with different types of advantages/performance gains depending on the application....
[...]
...Every location in the array represents a node of the tree: T [1] is the root, with children at T [2] and T [3]....
[...]
...The text by [2] is a good start as it contains a comprehensive description of algorithms and different architecture topologies for the network model (tree, hypercube, mesh, and butterfly)....
[...]

Book•

Computer Architecture and Parallel Processing

[...]

Kai Hwang, Faye A. Briggs

01 Jan 1984

TL;DR: The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.

...read moreread less

Abstract: The book is intended as a text to support two semesters of courses in computer architecture at the college senior and graduate levels. There are excellent problems for students at the end of each chapter. The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.

...read moreread less

1,410 citations

"An introduction to parallel algorit..." refers background in this paper

...Parallel architectures have been described in several books (see, for example, [18, 29])....
[...]

Journal Article•DOI•

Data parallel algorithms

[...]

W. Daniel Hillis, Guy L. Steele

01 Dec 1986-Communications of The ACM

TL;DR: The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.

...read moreread less

Abstract: Parallel computers with tens of thousands of processors are typically programmed in a data parallel style, as opposed to the control parallel style used in multiprocessing. The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.

...read moreread less

1,000 citations

"An introduction to parallel algorit..." refers background in this paper

...Recent work on the mapping of PRAM algorithms on bounded-degree networks is described in [3,13,14, 20, 25], Our presentation on the communication complexity of the matrix-multiplication problem in the sharedmemory model is taken from [1], Data-parallel algorithms are described in [15]....
[...]

Proceedings Article•DOI•

Parallelism in random access machines

[...]

Steven Fortune, James C. Wyllie

01 May 1978

TL;DR: A model of computation based on random access machines operating in parallel and sharing a common memory is presented and can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines.

...read moreread less

Abstract: A model of computation based on random access machines operating in parallel and sharing a common memory is presented. The computational power of this model is related to that of traditional models. In particular, deterministic parallel RAM's can accept in polynomial time exactly the sets accepted by polynomial tape bounded Turing machines; nondeterministic RAM's can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines. Similar results hold for other classes. The effect of limiting the size of the common memory is also considered.

...read moreread less

951 citations

"An introduction to parallel algorit..." refers background in this paper

...Rigorous descriptions of shared-memory models were introduced later in [11,12]....
[...]

Journal Article•DOI•

The Parallel Evaluation of General Arithmetic Expressions

[...]

Richard P. Brent¹•Institutions (1)

Australian National University¹

01 Apr 1974-Journal of the ACM

TL;DR: It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log 2 + 10(n - 1) using processors which can independently perform arithmetic operations in unit time.

...read moreread less

Abstract: It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log2n + 10(n - 1)/p using p ≥ 1 processors which can independently perform arithmetic operations in unit time. This bound is within a constant factor of the best possible. A sharper result is given for expressions without the division operation, and the question of numerical stability is discussed.

...read moreread less

864 citations

"An introduction to parallel algorit..." refers methods in this paper

...The WT scheduling principle is derived from a theorem in [7], In the literature, this principle is commonly referred to as Brent's theorem or Brent's scheduling principle....
[...]