scispace - formally typeset
Search or ask a question
Book

An introduction to parallel algorithms

01 Oct 1992-
TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.
Abstract: Written by an authority in the field, this book provides an introduction to the design and analysis of parallel algorithms. The emphasis is on the application of the PRAM (parallel random access machine) model of parallel computation, with all its variants, to algorithm analysis. Special attention is given to the selection of relevant data structures and to algorithm design principles that have proved to be useful. Features *Uses PRAM (parallel random access machine) as the model for parallel computation. *Covers all essential classes of parallel algorithms. *Rich exercise sets. *Written by a highly respected author within the field. 0201548569B04062001

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This work considers the following “fence enclosure” problem: given a set of points in the plane with valuesvi ≥ 0, the authors wish to enclose a subset of the points with a fence (a simple closed curve) in order to maximize the “value” of the enclosure.
Abstract: We study a variety of geometric versions of the classical knapsack problem. In particular, we consider the following "fence enclosure" problem: given a setS ofn points in the plane with valuesvi ź 0, we wish to enclose a subset of the points with a fence (a simple closed curve) in order to maximize the "value" of the enclosure. The value of the enclosure is defined to be the sum of the values of the enclosed points minus the cost of the fence. We consider various versions of the problem, such as allowingS to consist of points and/or simple polygons. Other versions of the problems are obtained by restricting the total amount of fence available and also allowing the enclosure to consist of at mostM connected components. When there is an upper bound on the length of fence available, we show that the problem is NP-complete. We also provide polynomial-time algorithms for many versions of the fence problem when an unrestricted amount of fence is available.

32 citations

Proceedings ArticleDOI
01 Nov 2019
TL;DR: This paper presents an algorithm that for graphs with diameter D in the wide range [log^ε n, n], takes O(log D) rounds to identify the connected components and takes O (log log n) rounds for all other graphs and uses an optimal total space of O(m).
Abstract: Identifying the connected components of a graph, apart from being a fundamental problem with countless applications, is a key primitive for many other algorithms. In this paper, we consider this problem in parallel settings. Particularly, we focus on the Massively Parallel Computations (MPC) model, which is the standard theoretical model for modern parallel frameworks such as MapReduce, Hadoop, or Spark. We consider the truly sublinear regime of MPC for graph problems where the space per machine is n^δ for some desirably small constant δ ∊ (0, 1). We present an algorithm that for graphs with diameter D in the wide range [log^e n, n], takes O(log D) rounds to identify the connected components and takes O(log log n) rounds for all other graphs. The algorithm is randomized, succeeds with high probability, does not require prior knowledge of D, and uses an optimal total space of O(m). We complement this by showing a conditional lower-bound based on the widely believed TwoCycle conjecture that Ω(log D) rounds are indeed necessary in this setting. Studying parallel connectivity algorithms received a resurgence of interest after the pioneering work of Andoni etal [FOCS 2018] who presented an algorithm with O(log D log log n) round-complexity. Our algorithm improves this result for the whole range of values of D and almost settles the problem due to the conditional lower-bound. Additionally, we show that with minimal adjustments, our algorithm can also be implemented in a variant of (CRCW) PRAM in asymptotically the same number of rounds.

32 citations


Cites background from "An introduction to parallel algorit..."

  • ...Note that in work-depth models, we do not concern ourselves with how processors are mapped to tasks (see for example, Jaja [43] or Blelloch et al....

    [...]

Journal ArticleDOI
TL;DR: This paper presents parallel algorithms for the following four operations on red–black trees: construction, search, insertion, and deletion, which run in O ( log n+ log k) time with k processors on the EREW PRAM.

32 citations

01 Jan 2013
TL;DR: This thesis develops highly efficient and parallelizable algorithms for solving linear systems involving graph Laplacian matrices and gives two solvers that take diametrically opposite approaches, the first highly efficient solver for Laplachian linear systems that parallelizes almost completely.
Abstract: Spectral graph theory is the interplay between linear algebra and combinatorial graph theory. Laplace’s equation and its discrete form, the Laplacian matrix, appear ubiquitously in mathematical physics. Due to the recent discovery of very fast solvers for these equations, they are also becoming increasingly useful in combinatorial optimization, computer vision, computer graphics, and machine learning. In this thesis, we develop highly efficient and parallelizable algorithms for solving linear systems involving graph Laplacian matrices. These solvers can also be extended to symmetric diagonally dominant matrices and M -matrices, both of which are closely related to graph Laplacians. Our algorithms build upon two decades of progress on combinatorial preconditioning, which connects numerical and combinatorial algorithms through spectral graph theory. They in turn rely on tools from numerical analysis, metric embeddings, and random matrix theory. We give two solver algorithms that take diametrically opposite approaches. The first is motivated by combinatorial algorithms, and aims to gradually break the problem into several smaller ones. It represents major simplifications over previous solver constructions, and has theoretical running time comparable to sorting. The second is motivated by numerical analysis, and aims to rapidly improve the algebraic connectivity of the graph. It is the first highly efficient solver for Laplacian linear systems that parallelizes almost completely. Our results improve the performances of applications of fast linear system solvers ranging from scientific computing to algorithmic graph theory. We also show that these solvers can be used to address broad classes of image processing tasks, and give some preliminary experimental results.

31 citations

Book
01 Jun 2017
TL;DR: Ligra, the first high-level shared-memory framework for parallel graph traversal algorithms, is introduced, enabling short and concise implementations that deliver performance competitive with that of highly optimized code and up to orders of magnitude faster than previous systems designed for distributed memory.
Abstract: Parallelism is the key to achieving high performance in computing. However, writing efficient and scalable parallel programs is notoriously difficult, and often requires significant expertise. To address this challenge, it is crucial to provide programmers with high-level tools to enable them to develop solutions easily, and at the same time emphasize the theoretical and practical aspects of algorithm design to allow the solutions developed to run efficiently under many different settings. This book, a revised version of the thesis that won the 2015 ACM Doctoral Dissertation Award, addresses this challenge using a three-pronged approach consisting of the design of shared-memory programming techniques, frameworks, and algorithms for important problems in computing. It provides evidence that with appropriate programming techniques, frameworks, and algorithms, shared-memory programs can be simple, fast, and scalable, both in theory and in practice. The results serve to ease the transition into the multicore era. The book starts by introducing tools and techniques for deterministic parallel programming, including means for encapsulating nondeterminism via powerful commutative building blocks, as well as a novel framework for executing sequential iterative loops in parallel, which lead to deterministic parallel algorithms that are efficient both in theory and in practice. The book then introduces Ligra, the first high-level shared-memory framework for parallel graph traversal algorithms. The framework enables short and concise implementations that deliver performance competitive with that of highly optimized code and up to orders of magnitude faster than previous systems designed for distributed memory. Finally, the book bridges the gap between theory and practice in parallel algorithm design by introducing the first algorithms for a variety of important problems on graphs and strings that are both practical and theoretically efficient.

31 citations


Cites background or methods from "An introduction to parallel algorit..."

  • ..., n− 1] [388] to group all the nodes, and then (2) sorting the nodes within each group using a parallel comparison sort [243]....

    [...]

  • ...They can also be analyzed in the Work-Time Framework [243], in which the total number of operations W and number of parallel time steps T is specified....

    [...]

  • ...As with list contraction, parallel algorithms for tree contraction have received considerable interest [335, 243, 395]....

    [...]

  • ...Algorithms are analyzed using the Work-Time Framework [243], where work is the same as in the work-depth model and time is equivalent to depth in the work-depth model....

    [...]

  • ...The main goal in developing efficient parallel algorithms is to have an algorithm with low (polylogarithmic) depth and work matching that of the best 5This contrasts with the Work-Time Framework, which is a framework for analyzing PRAM algorithms and does not allow for nested parallelism....

    [...]

References
More filters
Book
01 Sep 1991
TL;DR: This chapter discusses sorting on a Linear Array with a Systolic and Semisystolic Model of Computation, which automates the very labor-intensive and therefore time-heavy and expensive process of manually sorting arrays.
Abstract: Preface Acknowledgments Notation 1 Arrays and Trees 1.1 Elementary Sorting and Counting 1.1.1 Sorting on a Linear Array Assessing the Performance of the Algorithm Sorting N Numbers with Fewer Than N Processors 1.1.2 Sorting in the Bit Model 1.1.3 Lower Bounds 1.1.4 A Counterexample-Counting 1.1.5 Properties of the Fixed-Connection Network Model 1.2 Integer Arithmetic 1.2.1 Carry-Lookahead Addition 1.2.2 Prefix Computations-Segmented Prefix Computations 1.2.3 Carry-Save Addition 1.2.4 Multiplication and Convolution 1.2.5 Division and Newton Iteration 1.3 Matrix Algorithms 1.3.1 Elementary Matrix Products 1.3.2 Algorithms for Triangular Matrices 1.3.3 Algorithms for Tridiagonal Matrices -Odd-Even Reduction -Parallel Prefix Algorithms 1.3.4 Gaussian Elimination 1.3.5 Iterative Methods -Jacobi Relaxation -Gauss-Seidel Relaxation Finite Difference Methods -Multigrid Methods 1.4 Retiming and Systolic Conversion 1.4.1 A Motivating Example-Palindrome Recognition 1.4.2 The Systolic and Semisystolic Model of Computation 1.4.3 Retiming Semisystolic Networks 1.4.4 Conversion of a Semisystolic Network into a Systolic Network 1.4.5 The Special Case of Broadcasting 1.4.6 Retiming the Host 1.4.7 Design by Systolic Conversion-A Summary 1.5 Graph Algorithms 1.5.1 Transitive Closure 1.5.2 Connected Components 1.5.3 Shortest Paths 1.5.4 Breadth-First Spanning Trees 1.5.5 Minimum Weight Spanning Trees 1.6 Sorting Revisited 1.6.1 Odd-Even Transposition Sort on a Linear Array 1.6.2 A Simple Root-N(log N + 1)-Step Sorting Algorithm 1.6.3 A (3 Root- N + o(Root-N))-Step Sorting Algorithm 1.6.4 A Matching Lower Bound 1.7 Packet Routing 1.7.1 Greedy Algorithms 1.7.2 Average-Case Analysis of Greedy Algorithms -Routing N Packets to Random Destinations -Analysis of Dynamic Routing Problems 1.7.3 Randomized Routing Algorithms 1.7.4 Deterministic Algorithms with Small Queues 1.7.5 An Off-line Algorithm 1.7.6 Other Routing Models and Algorithms 1.8 Image Analysis and Computational Geometry 1.8.1 Component-Labelling Algorithms -Levialdi's Algorithm -An O (Root-N)-Step Recursive Algorithm 1.8.2 Computing Hough Transforms 1.8.3 Nearest-Neighbor Algorithms 1.8.4 Finding Convex Hulls 1.9 Higher-Dimensional Arrays 1.9.1 Definitions and Properties 1.9.2 Matrix Multiplication 1.9.3 Sorting 1.9.4 Packet Routing 1.9.5 Simulating High-Dimensional Arrays on Low-Dimensional Arrays 1.10 problems 1.11 Bibliographic Notes 2 Meshes of Trees 2.1 The Two-Dimensional Mesh of Trees 2.1.1 Definition and Properties 2.1.2 Recursive Decomposition 2.1.3 Derivation from KN,N 2.1.4 Variations 2.1.5 Comparison With the Pyramid and Multigrid 2.2 Elementary O(log N)-Step Algorithms 2.2.1 Routing 2.2.2 Sorting 2.2.3 Matrix-Vector Multiplication 2.2.4 Jacobi Relaxation 2.2.5 Pivoting 2.2.6 Convolution 2.2.7 Convex Hull 2.3 Integer Arithmetic 2.3.1 Multiplication 2.3.2 Division and Chinese Remaindering 2.3.3 Related Problems -Iterated Products -Rooting Finding 2.4 Matrix Algorithms 2.4.1 The Three-Dimensional Mesh of Trees 2.4.2 Matrix Multiplication 2.4.3 Inverting Lower Triangular Matrices 2.4.4 Inverting Arbitrary Matrices -Csanky's Algorithm -Inversion by Newton Iteration 2.4.5 Related Problems 2.5 Graph Algorithms 2.5.1 Minimum-Weight Spanning Trees 2.5.2 Connected Components 2.5.3 Transitive Closure 2.5.4 Shortest Paths 2.5.5 Matching Problems 2.6 Fast Evaluation of Straight-Line Code 2.6.1 Addition and Multiplication Over a Semiring 2.6.2 Extension to Codes with Subtraction and Division 2.6.3 Applications 2.7 Higher-Dimensional meshes of Trees 2.7.1 Definitions and Properties 2.7.2 The Shuffle-Tree Graph 2.8 Problems 2.9 Bibliographic Notes 3 Hypercubes and Related Networks 3.1 The Hypercube 3.1.1 Definitions and Properties 3.1.2 Containment of Arrays -Higher-Dimensional Arrays -Non-Power-of-2 Arrays 3.1.3 Containment of Complete Binary Trees 3.1.4 Embeddings of Arbitrary Binary Trees -Embeddings with Dilation 1 and Load O(M over N + log N) -Embeddings with Dilation O(1) and Load O (M over N + 1) -A Review of One-Error-Correcting Codes -Embedding Plog N into Hlog N 3.1.5 Containment of Meshes of Trees 3.1.6 Other Containment Results 3.2 The Butterfly, Cube-Connected-Cycles , and Benes Network 3.2.1 Definitions and Properties 3.2.2 Simulation of Arbitrary Networks 3.2.3 Simulation of Normal Hypercube Algorithms 3.2.4 Some Containment and Simulation Results 3.3 The Shuffle-Exchange and de Bruijn Graphs 3.3.1 Definitions and Properties 3.3.2 The Diaconis Card Tricks 3.3.3 Simulation of Normal Hypercube Algorithms 3.3.4 Similarities with the Butterfly 3.3.5 Some Containment and Simulation Results 3.4 Packet-Routing Algorithms 3.4.1 Definitions and Routing Models 3.4.2 Greedy Routing Algorithms and Worst-Case Problems 3.4.3 Packing, Spreading, and Monotone Routing Problems -Reducing a Many-to-Many Routing Problem to a Many-to-One Routing Problem -Reducing a Routing Problem to a Sorting Problem 3.4.4 The Average-Case Behavior of the Greedy Algorithm -Bounds on Congestion -Bounds on Running Time -Analyzing Non-Predictive Contention-Resolution Protocols 3.4.5 Converting Worst-Case Routing Problems into Average-Case Routing Problems -Hashing -Randomized Routing 3.4.6 Bounding Queue Sizes -Routing on Arbitrary Levelled Networks 3.4.7 Routing with Combining 3.4.8 The Information Dispersal Approach to Routing -Using Information Dispersal to Attain Fault-Tolerance -Finite Fields and Coding Theory 3.4.9 Circuit-Switching Algorithms 3.5 Sorting 3.5.1 Odd-Even Merge Sort -Constructing a Sorting Circuit with Depth log N(log N +1)/2 3.5.2 Sorting Small Sets 3.5.3 A Deterministic O(log N log log N)-Step Sorting Algorithm 3.5.4 Randomized O(log N)-Step Sorting Algorithms -A Circuit with Depth 7.45 log N that Usually Sorts 3.6 Simulating a Parallel Random Access Machine 3.6.1 PRAM Models and Shared Memories 3.6.2 Randomized Simulations Based on Hashing 3.6.3 Deterministic Simulations using Replicated Data 3.6.4 Using Information Dispersal to Improve Performance 3.7 The Fast Fourier Transform 3.7.1 The Algorithm 3.7.2 Implementation on the Butterfly and Shuffle-Exchange Graph 3.7.3 Application to Convolution and Polynomial Arithmetic 3.7.4 Application to Integer Multiplication 3.8 Other Hypercubic Networks 3.8.1 Butterflylike Networks -The Omega Network -The Flip Network -The Baseline and Reverse Baseline Networks -Banyan and Delta Networks -k-ary Butterflies 3.8.2 De Bruijn-Type Networks -The k-ary de Bruijn Graph -The Generalized Shuffle-Exchange Graph 3.9 Problems 3.10 Bibliographic Notes Bibliography Index Lemmas, Theorems, and Corollaries Author Index Subject Index

2,895 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Multiprocessorbased computers have been around for decades and various types of computer architectures [2] have been implemented in hardware throughout the years with different types of advantages/performance gains depending on the application....

    [...]

  • ...Every location in the array represents a node of the tree: T [1] is the root, with children at T [2] and T [3]....

    [...]

  • ...The text by [2] is a good start as it contains a comprehensive description of algorithms and different architecture topologies for the network model (tree, hypercube, mesh, and butterfly)....

    [...]

Book
01 Jan 1984
TL;DR: The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.
Abstract: The book is intended as a text to support two semesters of courses in computer architecture at the college senior and graduate levels. There are excellent problems for students at the end of each chapter. The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.

1,410 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Parallel architectures have been described in several books (see, for example, [18, 29])....

    [...]

Journal ArticleDOI
TL;DR: The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.
Abstract: Parallel computers with tens of thousands of processors are typically programmed in a data parallel style, as opposed to the control parallel style used in multiprocessing. The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.

1,000 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Recent work on the mapping of PRAM algorithms on bounded-degree networks is described in [3,13,14, 20, 25], Our presentation on the communication complexity of the matrix-multiplication problem in the sharedmemory model is taken from [1], Data-parallel algorithms are described in [15]....

    [...]

Proceedings ArticleDOI
01 May 1978
TL;DR: A model of computation based on random access machines operating in parallel and sharing a common memory is presented and can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines.
Abstract: A model of computation based on random access machines operating in parallel and sharing a common memory is presented. The computational power of this model is related to that of traditional models. In particular, deterministic parallel RAM's can accept in polynomial time exactly the sets accepted by polynomial tape bounded Turing machines; nondeterministic RAM's can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines. Similar results hold for other classes. The effect of limiting the size of the common memory is also considered.

951 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Rigorous descriptions of shared-memory models were introduced later in [11,12]....

    [...]

Journal ArticleDOI
TL;DR: It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log 2 + 10(n - 1) using processors which can independently perform arithmetic operations in unit time.
Abstract: It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log2n + 10(n - 1)/p using p ≥ 1 processors which can independently perform arithmetic operations in unit time. This bound is within a constant factor of the best possible. A sharper result is given for expressions without the division operation, and the question of numerical stability is discussed.

864 citations


"An introduction to parallel algorit..." refers methods in this paper

  • ...The WT scheduling principle is derived from a theorem in [7], In the literature, this principle is commonly referred to as Brent's theorem or Brent's scheduling principle....

    [...]