scispace - formally typeset
Search or ask a question
Book

An introduction to parallel algorithms

01 Oct 1992-
TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.
Abstract: Written by an authority in the field, this book provides an introduction to the design and analysis of parallel algorithms. The emphasis is on the application of the PRAM (parallel random access machine) model of parallel computation, with all its variants, to algorithm analysis. Special attention is given to the selection of relevant data structures and to algorithm design principles that have proved to be useful. Features *Uses PRAM (parallel random access machine) as the model for parallel computation. *Covers all essential classes of parallel algorithms. *Rich exercise sets. *Written by a highly respected author within the field. 0201548569B04062001

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
01 Jun 1997
TL;DR: The Queuing Shared Memory (QSM) model is introduced, which accounts for limited communication bandwidth while still providing a simple shared-memory abstraction, and can potentially serve as viable alternatives to existing message-passing, distributed-memory bridging models.
Abstract: There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely-used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared to the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the shared-memory abstraction as an easy-to-use platform, the bandwidth limitations of current machines have diverted much attention to message-passing and distributed-memory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a shared-memory model can serve as an effective bridging model for parallel computation. In particular, can a shared-memory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing Shared Memory (QSM) model, which accounts for limited communication bandwidth while still providing a simple shared-memory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple work-preserving emulation of the QSM on both the BSP, and on a related model, the (d,x)-BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios in which the high-level QSM more accurately models certain machines than the more detailed BSP and LogP models. Finally, we present algorithmic results for the QSM, as well as general strategies for mapping algorithms designed for the BSP or PRAM models onto the QSM model. Our main conclusion is that shared-memory models can potentially serve as viable alternatives to existing message-passing, distributed-memory bridging models.

76 citations


Cites background from "An introduction to parallel algorit..."

  • ...Primary among them are the parallel random access machine (PRAM) model [29], [48], [43], [65], in which processors execute in lock-step and communicate by reading and writing locations in a shared memory, and network-based models (hypercube, butterfly, arrays, etc....

    [...]

  • ..., [43], [48], and [65])....

    [...]

Journal ArticleDOI
TL;DR: This paper resolves a long-standing open problem on whether the concurrent write capability of parallel random access machine (PRAM) is essential for solving fundamental graph problems like connected components and minimum spanning trees in logarithmic time.
Abstract: This paper resolves a long-standing open problem on whether the concurrent write capability of parallel random access machine (PRAM) is essential for solving fundamental graph problems like connected components and minimum spanning trees in O(logn) time. Specifically, we present a new algorithm to solve these problems in O(logn) time using a linear number of processors on the exclusive-read exclusive-write PRAM. The logarithmic time bound is actually optimal since it is well known that even computing the “OR” of nbit requires O(log n time on the exclusive-write PRAM. The efficiency achieved by the new algorithm is based on a new schedule which can exploit a high degree of parallelism.

76 citations


Additional excerpts

  • ...[JA´JA´ 1992, LEMMA 5.4]....

    [...]

Journal ArticleDOI
TL;DR: A remarkable property of this algorithm is that it does not require any assumptions about the root separation off, which were either explicitly, or implicitly, required by previous algorithms, and it also has a work-efficient parallel implementation.

76 citations

Proceedings ArticleDOI
John H. Reif1
20 Jul 1995
TL;DR: The first, the Parallel Associative Memory (PAM) Model, is a very high level model which appears to improve the power of molecular parallelism beyond the operations previously considered by Lipton, and two abstract models of molecular computation are proposed.
Abstract: This paper describes techniques for massively parallel computation at the molecular scale, which we refer to as molecular parallelism. While this may at first appear to be purely science fiction, already Adleman [A 94] has employed molecular parallelism in the solution of the Hamiltonian path problem of n nodes using O(n2) lab steps on recombinant DNA of length O(n log n) base pairs. He successfully tested his techniques in a small lab experiment on DNA for a 7 node Hamiltonian path problem, Lipton [L 94] showed that finding the satisfying inputs to a Boolean circuit of size n can be done in O(n) lab steps using DNA of length O(n log n) base pairs. This recent work by Adleman and Lipton in molecular parallelism considered only the solution of NP search problems, and provided no way of quickly executing lengthy computations by purely molecular means; the number of lab steps depended linearly on the size of the simulated circuit. This paper describes techniques for quickly executing lengthy computations bg the use of molecular parallelism. We demonstrate that molecular computations can be done using short DNA strands by more or less conventional biotechnology engineering techniques within a small number of lab steps. We propose two abstract models of molecular computation. The first, the Parallel Associative Memory (PAM) Model, is a very high level model which includes a Parallel Associative Matching (PA-Match) operation, that appears to improve the power of molecular parallelism beyond the operations previously considered by Lipton [L 94]. We give some simulations of conventional sequential and parallel computational models by our PAM model. Each of the simulations use strings of length O(s) over an alphabet of size n (which correspond to DNA of length 0(s log n) baae pairs). Using O(t) PA-Match operations as well as 0(s logs) PAM operations that are not PA-Match, we can: (1) simulate a nondeterministic Turing Machine computation with space bound s < tand time bound 2°1sJ, (2) simulate a CREW PRAM with time bound D, with A4 memory cells, and processor bound P, where here s = *Surface address: Department of Computer Science, Duke University, Durham, NC 2770 S-0129. E-mail: reifOcs.duke. edu. Supported by NSF Grant NSF-IRI-91-006S1, Rome Labs Contracts F30602-94-C0037, ARPA/SISTO contracts NOO014-91-J-1985, and NOOO14-92-C01S2 under subcontract KI-92-01-0182. This paper, complete with figures, can be found in http://www.cs.duke. edu/ reif/HomePage.html. Permission to make d\gital/lmrd copies of al] or part of this nmtcrial without fee is granted prowded tlmt the copies are nnt made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication and its date oppefir, and notice is given that copyright is by permission of the Associating for Computing h40chinery, Inc. (ACM). To copy otherwise, to rcpubhsh ,to post on servers or to redistribute to lists, requires specific permission wld/or fee. SPAA’95 Santa Barbara CA USA@ 1995 ACM 0-89791-717-0/95/07.$3.50 O(log(PiM)) and t = D + s, (3) find the satisfying inputs to a Boolean circuit constructible in s space with n inputs, unbounded fan-out, and depth D, where here t = D + s, and (4) do List and Tree Contraction for inputs of size L constructible in s space (using here tubes each with O(B + L/ log L) aggregates and assuming a precomputed tube of B aggregates for the Contraction operation) where here t =

75 citations

Journal ArticleDOI
TL;DR: This paper investigates the quantitative reconstruction of the 3D structure of a scene from a line drawing, by using the geometrical constraints provided by the location of vanishing points to design an algorithm which has several advantages with respect to the usual approach based on a reduction to linear programming.
Abstract: This paper investigates the quantitative reconstruction of the 3D structure of a scene from a line drawing, by using the geometrical constraints provided by the location of vanishing points. The additional information on vanishing points allows the design of an algorithm which has several advantages with respect to the usual approach based on a reduction to linear programming (Sugihara, 1982). These advantages range from a lower computational complexity to error tolerance and exact reconstruction of the 3D-geometry of the objects. These features make the algorithm a useful tool for the quantitative analysis of real-world images, which is useful for several tasks from scene understanding to automatic vehicle guidance.

75 citations

References
More filters
Book
01 Sep 1991
TL;DR: This chapter discusses sorting on a Linear Array with a Systolic and Semisystolic Model of Computation, which automates the very labor-intensive and therefore time-heavy and expensive process of manually sorting arrays.
Abstract: Preface Acknowledgments Notation 1 Arrays and Trees 1.1 Elementary Sorting and Counting 1.1.1 Sorting on a Linear Array Assessing the Performance of the Algorithm Sorting N Numbers with Fewer Than N Processors 1.1.2 Sorting in the Bit Model 1.1.3 Lower Bounds 1.1.4 A Counterexample-Counting 1.1.5 Properties of the Fixed-Connection Network Model 1.2 Integer Arithmetic 1.2.1 Carry-Lookahead Addition 1.2.2 Prefix Computations-Segmented Prefix Computations 1.2.3 Carry-Save Addition 1.2.4 Multiplication and Convolution 1.2.5 Division and Newton Iteration 1.3 Matrix Algorithms 1.3.1 Elementary Matrix Products 1.3.2 Algorithms for Triangular Matrices 1.3.3 Algorithms for Tridiagonal Matrices -Odd-Even Reduction -Parallel Prefix Algorithms 1.3.4 Gaussian Elimination 1.3.5 Iterative Methods -Jacobi Relaxation -Gauss-Seidel Relaxation Finite Difference Methods -Multigrid Methods 1.4 Retiming and Systolic Conversion 1.4.1 A Motivating Example-Palindrome Recognition 1.4.2 The Systolic and Semisystolic Model of Computation 1.4.3 Retiming Semisystolic Networks 1.4.4 Conversion of a Semisystolic Network into a Systolic Network 1.4.5 The Special Case of Broadcasting 1.4.6 Retiming the Host 1.4.7 Design by Systolic Conversion-A Summary 1.5 Graph Algorithms 1.5.1 Transitive Closure 1.5.2 Connected Components 1.5.3 Shortest Paths 1.5.4 Breadth-First Spanning Trees 1.5.5 Minimum Weight Spanning Trees 1.6 Sorting Revisited 1.6.1 Odd-Even Transposition Sort on a Linear Array 1.6.2 A Simple Root-N(log N + 1)-Step Sorting Algorithm 1.6.3 A (3 Root- N + o(Root-N))-Step Sorting Algorithm 1.6.4 A Matching Lower Bound 1.7 Packet Routing 1.7.1 Greedy Algorithms 1.7.2 Average-Case Analysis of Greedy Algorithms -Routing N Packets to Random Destinations -Analysis of Dynamic Routing Problems 1.7.3 Randomized Routing Algorithms 1.7.4 Deterministic Algorithms with Small Queues 1.7.5 An Off-line Algorithm 1.7.6 Other Routing Models and Algorithms 1.8 Image Analysis and Computational Geometry 1.8.1 Component-Labelling Algorithms -Levialdi's Algorithm -An O (Root-N)-Step Recursive Algorithm 1.8.2 Computing Hough Transforms 1.8.3 Nearest-Neighbor Algorithms 1.8.4 Finding Convex Hulls 1.9 Higher-Dimensional Arrays 1.9.1 Definitions and Properties 1.9.2 Matrix Multiplication 1.9.3 Sorting 1.9.4 Packet Routing 1.9.5 Simulating High-Dimensional Arrays on Low-Dimensional Arrays 1.10 problems 1.11 Bibliographic Notes 2 Meshes of Trees 2.1 The Two-Dimensional Mesh of Trees 2.1.1 Definition and Properties 2.1.2 Recursive Decomposition 2.1.3 Derivation from KN,N 2.1.4 Variations 2.1.5 Comparison With the Pyramid and Multigrid 2.2 Elementary O(log N)-Step Algorithms 2.2.1 Routing 2.2.2 Sorting 2.2.3 Matrix-Vector Multiplication 2.2.4 Jacobi Relaxation 2.2.5 Pivoting 2.2.6 Convolution 2.2.7 Convex Hull 2.3 Integer Arithmetic 2.3.1 Multiplication 2.3.2 Division and Chinese Remaindering 2.3.3 Related Problems -Iterated Products -Rooting Finding 2.4 Matrix Algorithms 2.4.1 The Three-Dimensional Mesh of Trees 2.4.2 Matrix Multiplication 2.4.3 Inverting Lower Triangular Matrices 2.4.4 Inverting Arbitrary Matrices -Csanky's Algorithm -Inversion by Newton Iteration 2.4.5 Related Problems 2.5 Graph Algorithms 2.5.1 Minimum-Weight Spanning Trees 2.5.2 Connected Components 2.5.3 Transitive Closure 2.5.4 Shortest Paths 2.5.5 Matching Problems 2.6 Fast Evaluation of Straight-Line Code 2.6.1 Addition and Multiplication Over a Semiring 2.6.2 Extension to Codes with Subtraction and Division 2.6.3 Applications 2.7 Higher-Dimensional meshes of Trees 2.7.1 Definitions and Properties 2.7.2 The Shuffle-Tree Graph 2.8 Problems 2.9 Bibliographic Notes 3 Hypercubes and Related Networks 3.1 The Hypercube 3.1.1 Definitions and Properties 3.1.2 Containment of Arrays -Higher-Dimensional Arrays -Non-Power-of-2 Arrays 3.1.3 Containment of Complete Binary Trees 3.1.4 Embeddings of Arbitrary Binary Trees -Embeddings with Dilation 1 and Load O(M over N + log N) -Embeddings with Dilation O(1) and Load O (M over N + 1) -A Review of One-Error-Correcting Codes -Embedding Plog N into Hlog N 3.1.5 Containment of Meshes of Trees 3.1.6 Other Containment Results 3.2 The Butterfly, Cube-Connected-Cycles , and Benes Network 3.2.1 Definitions and Properties 3.2.2 Simulation of Arbitrary Networks 3.2.3 Simulation of Normal Hypercube Algorithms 3.2.4 Some Containment and Simulation Results 3.3 The Shuffle-Exchange and de Bruijn Graphs 3.3.1 Definitions and Properties 3.3.2 The Diaconis Card Tricks 3.3.3 Simulation of Normal Hypercube Algorithms 3.3.4 Similarities with the Butterfly 3.3.5 Some Containment and Simulation Results 3.4 Packet-Routing Algorithms 3.4.1 Definitions and Routing Models 3.4.2 Greedy Routing Algorithms and Worst-Case Problems 3.4.3 Packing, Spreading, and Monotone Routing Problems -Reducing a Many-to-Many Routing Problem to a Many-to-One Routing Problem -Reducing a Routing Problem to a Sorting Problem 3.4.4 The Average-Case Behavior of the Greedy Algorithm -Bounds on Congestion -Bounds on Running Time -Analyzing Non-Predictive Contention-Resolution Protocols 3.4.5 Converting Worst-Case Routing Problems into Average-Case Routing Problems -Hashing -Randomized Routing 3.4.6 Bounding Queue Sizes -Routing on Arbitrary Levelled Networks 3.4.7 Routing with Combining 3.4.8 The Information Dispersal Approach to Routing -Using Information Dispersal to Attain Fault-Tolerance -Finite Fields and Coding Theory 3.4.9 Circuit-Switching Algorithms 3.5 Sorting 3.5.1 Odd-Even Merge Sort -Constructing a Sorting Circuit with Depth log N(log N +1)/2 3.5.2 Sorting Small Sets 3.5.3 A Deterministic O(log N log log N)-Step Sorting Algorithm 3.5.4 Randomized O(log N)-Step Sorting Algorithms -A Circuit with Depth 7.45 log N that Usually Sorts 3.6 Simulating a Parallel Random Access Machine 3.6.1 PRAM Models and Shared Memories 3.6.2 Randomized Simulations Based on Hashing 3.6.3 Deterministic Simulations using Replicated Data 3.6.4 Using Information Dispersal to Improve Performance 3.7 The Fast Fourier Transform 3.7.1 The Algorithm 3.7.2 Implementation on the Butterfly and Shuffle-Exchange Graph 3.7.3 Application to Convolution and Polynomial Arithmetic 3.7.4 Application to Integer Multiplication 3.8 Other Hypercubic Networks 3.8.1 Butterflylike Networks -The Omega Network -The Flip Network -The Baseline and Reverse Baseline Networks -Banyan and Delta Networks -k-ary Butterflies 3.8.2 De Bruijn-Type Networks -The k-ary de Bruijn Graph -The Generalized Shuffle-Exchange Graph 3.9 Problems 3.10 Bibliographic Notes Bibliography Index Lemmas, Theorems, and Corollaries Author Index Subject Index

2,895 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Multiprocessorbased computers have been around for decades and various types of computer architectures [2] have been implemented in hardware throughout the years with different types of advantages/performance gains depending on the application....

    [...]

  • ...Every location in the array represents a node of the tree: T [1] is the root, with children at T [2] and T [3]....

    [...]

  • ...The text by [2] is a good start as it contains a comprehensive description of algorithms and different architecture topologies for the network model (tree, hypercube, mesh, and butterfly)....

    [...]

Book
01 Jan 1984
TL;DR: The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.
Abstract: The book is intended as a text to support two semesters of courses in computer architecture at the college senior and graduate levels. There are excellent problems for students at the end of each chapter. The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.

1,410 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Parallel architectures have been described in several books (see, for example, [18, 29])....

    [...]

Journal ArticleDOI
TL;DR: The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.
Abstract: Parallel computers with tens of thousands of processors are typically programmed in a data parallel style, as opposed to the control parallel style used in multiprocessing. The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.

1,000 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Recent work on the mapping of PRAM algorithms on bounded-degree networks is described in [3,13,14, 20, 25], Our presentation on the communication complexity of the matrix-multiplication problem in the sharedmemory model is taken from [1], Data-parallel algorithms are described in [15]....

    [...]

Proceedings ArticleDOI
01 May 1978
TL;DR: A model of computation based on random access machines operating in parallel and sharing a common memory is presented and can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines.
Abstract: A model of computation based on random access machines operating in parallel and sharing a common memory is presented. The computational power of this model is related to that of traditional models. In particular, deterministic parallel RAM's can accept in polynomial time exactly the sets accepted by polynomial tape bounded Turing machines; nondeterministic RAM's can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines. Similar results hold for other classes. The effect of limiting the size of the common memory is also considered.

951 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Rigorous descriptions of shared-memory models were introduced later in [11,12]....

    [...]

Journal ArticleDOI
TL;DR: It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log 2 + 10(n - 1) using processors which can independently perform arithmetic operations in unit time.
Abstract: It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log2n + 10(n - 1)/p using p ≥ 1 processors which can independently perform arithmetic operations in unit time. This bound is within a constant factor of the best possible. A sharper result is given for expressions without the division operation, and the question of numerical stability is discussed.

864 citations


"An introduction to parallel algorit..." refers methods in this paper

  • ...The WT scheduling principle is derived from a theorem in [7], In the literature, this principle is commonly referred to as Brent's theorem or Brent's scheduling principle....

    [...]