scispace - formally typeset
Search or ask a question
Book

An introduction to parallel algorithms

01 Oct 1992-
TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.
Abstract: Written by an authority in the field, this book provides an introduction to the design and analysis of parallel algorithms. The emphasis is on the application of the PRAM (parallel random access machine) model of parallel computation, with all its variants, to algorithm analysis. Special attention is given to the selection of relevant data structures and to algorithm design principles that have proved to be useful. Features *Uses PRAM (parallel random access machine) as the model for parallel computation. *Covers all essential classes of parallel algorithms. *Rich exercise sets. *Written by a highly respected author within the field. 0201548569B04062001

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes a strategy for a programmer to clearly designate information useful for efficiently executing parallel recursion and shows differences in the performance of various execution strategies of parallel recursive algorithms and shows the usefulness of a programmer having the capability to select an execution strategy.
Abstract: The strategy of evenly distributing processors to each recursive call and the strategy of performing dynamic load-balancing exist as strategies for executing parallel recursions. There are cases in which an execution strategy efficient for a certain parallel recursive algorithm is not efficient for another parallel recursive algorithm depending on the operation of the parallel recursion algorithm. In addition, there are cases in which the execution efficiency decreases due to overparallelization of recursive calls due to factors such as the communication efficiency of a parallel computing environment. On the one hand, it is not easy for a compiler to analyze these factors mechanically from source programs of a parallel recursive algorithm, select an optimal execution strategy, and suppress overparallelization. In general, in many cases, a programmer understands parallel recursive algorithms and can predict which execution strategy is suitable for the algorithm. Thus, if a programmer can specify a parallel recursive execution strategy or specify a decision to suppress overparallelization, parallel recursive algorithms can be executed efficiently. This paper proposes a strategy for a programmer to clearly designate information useful for efficiently executing parallel recursion. In addition, it shows differences in the performance of various execution strategies of parallel recursive algorithms and shows the usefulness of a programmer having the capability to select an execution strategy. © 2004 Wiley Periodicals, Inc. Syst Comp Jpn, 35(9): 92–103, 2004; Published online in Wiley InterScience (). DOI 10.1002sscj.10009

2 citations

Proceedings ArticleDOI
10 Sep 2012
TL;DR: This work proposes a new algorithm based on a two players game that can reduce the evaluation time of a boolean circuit C by upto min(h, min(log d, log co - d) iterations where h is the maximal number of and-or alternations along any path in C and d is the algebraic degree (and co-degree) of C.
Abstract: In this work we consider the problem of fast parallel evaluation of boolean circuits - namely to evaluate a boolean circuit $C$, with input leaf values, faster than its depth, which would practically require $\log~depth$ iterations to complete. Finding a general parallel algorithm that can evaluate any circuit using $\log~depth$ iterations is known as the Circuit Value Problem (CVP). The CVP and its approximations are known to be P-complete and therefore, a heuristically solution that practically works for all ``real-computations'' is sought. In this work we propose a new algorithm based on a two players game that can reduce the evaluation time of a boolean circuit $C$ by up to $min(h, min (\log d, \log co-d))$ iterations where $h$ is the maximal number of and-or alternations along any path in $C$ and $d$ (and $co-d$) is the algebraic degree (and co-degree) of $C$. This improves the theoretical bound of the MRK algorithm (Miller, Ramachandran and Kaltofen 86) for the case of parallel evaluation of boolean circuits. More importantly we show, via experiments, that for circuits emanating from real programs, the proposed algorithm can practically evaluate circuits in $log-depth$ iterations. Each iteration can be evaluated in parallel using a connectivity step, and although it can be implemented using log-depth boolean circuits, we consider an optical switching realization that is based on Optical Ring Resonators (ORR). Due to quantum effects, propagating a light beam through a sequence of ORRs can be done with zero latency, thus making ORRs ideal for implementing the connectivity step required by the proposed algorithm. In order to obtain the needed experiments, we have extended the LLVM compiler to transform C-code into boolean circuits and then simulated the optical evaluation of these circuits using the proposed two player game. Our experiments indeed show that circuits emanating from real applications can be evaluated in log-depth iterations of the proposed algorithm and that the optical implementation is feasible.

2 citations


Cites background from "An introduction to parallel algorit..."

  • ...Parallel evaluation of restricted types of circuits such as a chain of sums (parallel prefix) algebraic formula and recurrences have been studied as well [21], [11], [14] and [24], [4], [3]....

    [...]

01 Jan 2010
TL;DR: Algorithms that produce the best presentation are described, which aim to find the resulting monotonic dependency not only to approximate the process, but to be always below the actual values.
Abstract: Even successful financial funds do not always grow monotonically; one in a while small downfalls happen. These downfalls are perceived badly by potential customers. So, it is desirable to present the history of the fund’s behavior as the sum of a monotonic trend and of a fluctuating stochastic part. The smaller the stochastic part, the larger the trend. So, in the best presentation corresponds to the smallest stochastic part. In this paper, we describe algorithms that produce the best presentation. I. Formulation of the Problem in Informal Terms Even successful finanical funds do not always grow monotonically: once in a while small downfalls happen. These downfalls (underwater periods) are perceived badly by potential customers. So, it is desirable to present the history of the fund’s behavior as the sum of a monotonic trend and of a fluctuating (stochastic) part. The smaller the stochastic part, the larger the trend. So, the best presentation corresponds to the smallest stochastic part. The problem of finding the best presentation was formulated in [1]. In this paper, we describe algorithms that produce the best presentation. II. How This Problem is Related to Interval Computations Our problem is: given a function, to find the close monotonic one. A similar problem has been considered and solved in [4, 3, 5]: Suppose that we have an unknown physical dependency y(x). We measure y for x = x1, ..., xn, and get measurement results ỹ1, ..., ỹn. Measurements are never absolutely accurate. Therefore, from the fact the measured value The authors are with The World Bank, 1818 H Street, N.W., Washington, DC 20433, email gdeboeck@worldbank.org (G. J. Deboeck), with Systems Engineering, Bell Northern Research, P.O. Box 833871 M\S D0-207, Richardson, TX 75083-3871, email villa@bnr.ca (K. Villaverde) and with Computer Science Department, The University of Texas at El Paso, El Paso, TX 79968, email vladik@cs.utep.edu (V. Kreinovich). This work was partially supported by NSF grant No. CDA-9015006 and NASA Research Grant No. 9-757. is ỹi, we can conclude that the actual value of y(xi) belongs to the interval [ỹi−∆, ỹi+∆], where ∆ is the accuracy of the measuring instrument that is guaranteed by the manufacturer of this instrument. The problem is: given xi, ỹi, and ∆, to check whether it is possible that the actual dependency is monotonic, and, if the dependency is definitely not monotonic (i.e., if it always has local extrema), to find possible locations of these local extrema. Our new problem is different from this one: we do not know ∆, and we want the resulting monotonic dependency not only to approximate the process, but to be always below the actual values. However, we can modify methods from [4, 3, 5] to solve this problem as well. III. Formulation of the Problem in Mathematical Terms

2 citations


Additional excerpts

  • ..., [2])....

    [...]

Posted Content
TL;DR: This paper presents the design and analysis of parallel approximation algorithms for facility-location problems, including NC and RNC algorithms for (metric) facility location, k-center, k -median, and k-means, and focuses on giving algorithms with low depth, near work efficiency, and low cache complexity.
Abstract: This paper presents the design and analysis of parallel approximation algorithms for facility-location problems, including $\NC$ and $\RNC$ algorithms for (metric) facility location, $k$-center, $k$-median, and $k$-means. These problems have received considerable attention during the past decades from the approximation algorithms community, concentrating primarily on improving the approximation guarantees. In this paper, we ask, is it possible to parallelize some of the beautiful results from the sequential setting? Our starting point is a small, but diverse, subset of results in approximation algorithms for facility-location problems, with a primary goal of developing techniques for devising their efficient parallel counterparts. We focus on giving algorithms with low depth, near work efficiency (compared to the sequential versions), and low cache complexity. Common in algorithms we present is the idea that instead of picking only the most cost-effective element, we make room for parallelism by allowing a small slack (e.g., a $(1+\vareps)$ factor) in what can be selected---then, we use a clean-up step to ensure that the behavior does not deviate too much from the sequential steps. All the algorithms we developed are ``cache efficient'' in that the cache complexity is bounded by $O(w/B)$, where $w$ is the work in the EREW model and $B$ is the block size.

2 citations

Journal ArticleDOI
01 Mar 2003
TL;DR: In this paper, the authors presented an output-sensitive algorithm that finds a plane region R* such that, for any point p in R*, the total length of the k shortest rectilinear paths connecting p and the k terminals without passing through any obstacle is minimum.
Abstract: Given k terminals and n axis-parallel rectangular obstacles on the plane, our algorithm finds a plane region R* such that, for any point p in R*, the total length of the k shortest rectilinear paths connecting p and the k terminals without passing through any obstacle is minimum. The algorithm is output-sensitive, and takes O((K+n) log n) time and O(K+n) space if k is a fixed constant, where K is the total number of polygonal vertices of the found region R*.

2 citations


Cites background from "An introduction to parallel algorit..."

  • ...Note that there are NC parallel algorithms for the shortest path problem on plane graphs [2], [7], [8] and for the plane sweep [ 7 ]....

    [...]

  • ...Note that there are NC parallel algorithms for the shortest path problem on plane graphs [2], [ 7 ], [8] and for the plane sweep [7]....

    [...]

References
More filters
Book
01 Sep 1991
TL;DR: This chapter discusses sorting on a Linear Array with a Systolic and Semisystolic Model of Computation, which automates the very labor-intensive and therefore time-heavy and expensive process of manually sorting arrays.
Abstract: Preface Acknowledgments Notation 1 Arrays and Trees 1.1 Elementary Sorting and Counting 1.1.1 Sorting on a Linear Array Assessing the Performance of the Algorithm Sorting N Numbers with Fewer Than N Processors 1.1.2 Sorting in the Bit Model 1.1.3 Lower Bounds 1.1.4 A Counterexample-Counting 1.1.5 Properties of the Fixed-Connection Network Model 1.2 Integer Arithmetic 1.2.1 Carry-Lookahead Addition 1.2.2 Prefix Computations-Segmented Prefix Computations 1.2.3 Carry-Save Addition 1.2.4 Multiplication and Convolution 1.2.5 Division and Newton Iteration 1.3 Matrix Algorithms 1.3.1 Elementary Matrix Products 1.3.2 Algorithms for Triangular Matrices 1.3.3 Algorithms for Tridiagonal Matrices -Odd-Even Reduction -Parallel Prefix Algorithms 1.3.4 Gaussian Elimination 1.3.5 Iterative Methods -Jacobi Relaxation -Gauss-Seidel Relaxation Finite Difference Methods -Multigrid Methods 1.4 Retiming and Systolic Conversion 1.4.1 A Motivating Example-Palindrome Recognition 1.4.2 The Systolic and Semisystolic Model of Computation 1.4.3 Retiming Semisystolic Networks 1.4.4 Conversion of a Semisystolic Network into a Systolic Network 1.4.5 The Special Case of Broadcasting 1.4.6 Retiming the Host 1.4.7 Design by Systolic Conversion-A Summary 1.5 Graph Algorithms 1.5.1 Transitive Closure 1.5.2 Connected Components 1.5.3 Shortest Paths 1.5.4 Breadth-First Spanning Trees 1.5.5 Minimum Weight Spanning Trees 1.6 Sorting Revisited 1.6.1 Odd-Even Transposition Sort on a Linear Array 1.6.2 A Simple Root-N(log N + 1)-Step Sorting Algorithm 1.6.3 A (3 Root- N + o(Root-N))-Step Sorting Algorithm 1.6.4 A Matching Lower Bound 1.7 Packet Routing 1.7.1 Greedy Algorithms 1.7.2 Average-Case Analysis of Greedy Algorithms -Routing N Packets to Random Destinations -Analysis of Dynamic Routing Problems 1.7.3 Randomized Routing Algorithms 1.7.4 Deterministic Algorithms with Small Queues 1.7.5 An Off-line Algorithm 1.7.6 Other Routing Models and Algorithms 1.8 Image Analysis and Computational Geometry 1.8.1 Component-Labelling Algorithms -Levialdi's Algorithm -An O (Root-N)-Step Recursive Algorithm 1.8.2 Computing Hough Transforms 1.8.3 Nearest-Neighbor Algorithms 1.8.4 Finding Convex Hulls 1.9 Higher-Dimensional Arrays 1.9.1 Definitions and Properties 1.9.2 Matrix Multiplication 1.9.3 Sorting 1.9.4 Packet Routing 1.9.5 Simulating High-Dimensional Arrays on Low-Dimensional Arrays 1.10 problems 1.11 Bibliographic Notes 2 Meshes of Trees 2.1 The Two-Dimensional Mesh of Trees 2.1.1 Definition and Properties 2.1.2 Recursive Decomposition 2.1.3 Derivation from KN,N 2.1.4 Variations 2.1.5 Comparison With the Pyramid and Multigrid 2.2 Elementary O(log N)-Step Algorithms 2.2.1 Routing 2.2.2 Sorting 2.2.3 Matrix-Vector Multiplication 2.2.4 Jacobi Relaxation 2.2.5 Pivoting 2.2.6 Convolution 2.2.7 Convex Hull 2.3 Integer Arithmetic 2.3.1 Multiplication 2.3.2 Division and Chinese Remaindering 2.3.3 Related Problems -Iterated Products -Rooting Finding 2.4 Matrix Algorithms 2.4.1 The Three-Dimensional Mesh of Trees 2.4.2 Matrix Multiplication 2.4.3 Inverting Lower Triangular Matrices 2.4.4 Inverting Arbitrary Matrices -Csanky's Algorithm -Inversion by Newton Iteration 2.4.5 Related Problems 2.5 Graph Algorithms 2.5.1 Minimum-Weight Spanning Trees 2.5.2 Connected Components 2.5.3 Transitive Closure 2.5.4 Shortest Paths 2.5.5 Matching Problems 2.6 Fast Evaluation of Straight-Line Code 2.6.1 Addition and Multiplication Over a Semiring 2.6.2 Extension to Codes with Subtraction and Division 2.6.3 Applications 2.7 Higher-Dimensional meshes of Trees 2.7.1 Definitions and Properties 2.7.2 The Shuffle-Tree Graph 2.8 Problems 2.9 Bibliographic Notes 3 Hypercubes and Related Networks 3.1 The Hypercube 3.1.1 Definitions and Properties 3.1.2 Containment of Arrays -Higher-Dimensional Arrays -Non-Power-of-2 Arrays 3.1.3 Containment of Complete Binary Trees 3.1.4 Embeddings of Arbitrary Binary Trees -Embeddings with Dilation 1 and Load O(M over N + log N) -Embeddings with Dilation O(1) and Load O (M over N + 1) -A Review of One-Error-Correcting Codes -Embedding Plog N into Hlog N 3.1.5 Containment of Meshes of Trees 3.1.6 Other Containment Results 3.2 The Butterfly, Cube-Connected-Cycles , and Benes Network 3.2.1 Definitions and Properties 3.2.2 Simulation of Arbitrary Networks 3.2.3 Simulation of Normal Hypercube Algorithms 3.2.4 Some Containment and Simulation Results 3.3 The Shuffle-Exchange and de Bruijn Graphs 3.3.1 Definitions and Properties 3.3.2 The Diaconis Card Tricks 3.3.3 Simulation of Normal Hypercube Algorithms 3.3.4 Similarities with the Butterfly 3.3.5 Some Containment and Simulation Results 3.4 Packet-Routing Algorithms 3.4.1 Definitions and Routing Models 3.4.2 Greedy Routing Algorithms and Worst-Case Problems 3.4.3 Packing, Spreading, and Monotone Routing Problems -Reducing a Many-to-Many Routing Problem to a Many-to-One Routing Problem -Reducing a Routing Problem to a Sorting Problem 3.4.4 The Average-Case Behavior of the Greedy Algorithm -Bounds on Congestion -Bounds on Running Time -Analyzing Non-Predictive Contention-Resolution Protocols 3.4.5 Converting Worst-Case Routing Problems into Average-Case Routing Problems -Hashing -Randomized Routing 3.4.6 Bounding Queue Sizes -Routing on Arbitrary Levelled Networks 3.4.7 Routing with Combining 3.4.8 The Information Dispersal Approach to Routing -Using Information Dispersal to Attain Fault-Tolerance -Finite Fields and Coding Theory 3.4.9 Circuit-Switching Algorithms 3.5 Sorting 3.5.1 Odd-Even Merge Sort -Constructing a Sorting Circuit with Depth log N(log N +1)/2 3.5.2 Sorting Small Sets 3.5.3 A Deterministic O(log N log log N)-Step Sorting Algorithm 3.5.4 Randomized O(log N)-Step Sorting Algorithms -A Circuit with Depth 7.45 log N that Usually Sorts 3.6 Simulating a Parallel Random Access Machine 3.6.1 PRAM Models and Shared Memories 3.6.2 Randomized Simulations Based on Hashing 3.6.3 Deterministic Simulations using Replicated Data 3.6.4 Using Information Dispersal to Improve Performance 3.7 The Fast Fourier Transform 3.7.1 The Algorithm 3.7.2 Implementation on the Butterfly and Shuffle-Exchange Graph 3.7.3 Application to Convolution and Polynomial Arithmetic 3.7.4 Application to Integer Multiplication 3.8 Other Hypercubic Networks 3.8.1 Butterflylike Networks -The Omega Network -The Flip Network -The Baseline and Reverse Baseline Networks -Banyan and Delta Networks -k-ary Butterflies 3.8.2 De Bruijn-Type Networks -The k-ary de Bruijn Graph -The Generalized Shuffle-Exchange Graph 3.9 Problems 3.10 Bibliographic Notes Bibliography Index Lemmas, Theorems, and Corollaries Author Index Subject Index

2,895 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Multiprocessorbased computers have been around for decades and various types of computer architectures [2] have been implemented in hardware throughout the years with different types of advantages/performance gains depending on the application....

    [...]

  • ...Every location in the array represents a node of the tree: T [1] is the root, with children at T [2] and T [3]....

    [...]

  • ...The text by [2] is a good start as it contains a comprehensive description of algorithms and different architecture topologies for the network model (tree, hypercube, mesh, and butterfly)....

    [...]

Book
01 Jan 1984
TL;DR: The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.
Abstract: The book is intended as a text to support two semesters of courses in computer architecture at the college senior and graduate levels. There are excellent problems for students at the end of each chapter. The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.

1,410 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Parallel architectures have been described in several books (see, for example, [18, 29])....

    [...]

Journal ArticleDOI
TL;DR: The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.
Abstract: Parallel computers with tens of thousands of processors are typically programmed in a data parallel style, as opposed to the control parallel style used in multiprocessing. The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.

1,000 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Recent work on the mapping of PRAM algorithms on bounded-degree networks is described in [3,13,14, 20, 25], Our presentation on the communication complexity of the matrix-multiplication problem in the sharedmemory model is taken from [1], Data-parallel algorithms are described in [15]....

    [...]

Proceedings ArticleDOI
01 May 1978
TL;DR: A model of computation based on random access machines operating in parallel and sharing a common memory is presented and can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines.
Abstract: A model of computation based on random access machines operating in parallel and sharing a common memory is presented. The computational power of this model is related to that of traditional models. In particular, deterministic parallel RAM's can accept in polynomial time exactly the sets accepted by polynomial tape bounded Turing machines; nondeterministic RAM's can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines. Similar results hold for other classes. The effect of limiting the size of the common memory is also considered.

951 citations


"An introduction to parallel algorit..." refers background in this paper

  • ...Rigorous descriptions of shared-memory models were introduced later in [11,12]....

    [...]

Journal ArticleDOI
TL;DR: It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log 2 + 10(n - 1) using processors which can independently perform arithmetic operations in unit time.
Abstract: It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log2n + 10(n - 1)/p using p ≥ 1 processors which can independently perform arithmetic operations in unit time. This bound is within a constant factor of the best possible. A sharper result is given for expressions without the division operation, and the question of numerical stability is discussed.

864 citations


"An introduction to parallel algorit..." refers methods in this paper

  • ...The WT scheduling principle is derived from a theorem in [7], In the literature, this principle is commonly referred to as Brent's theorem or Brent's scheduling principle....

    [...]