An introduction to parallel algorithms

Home
/
Papers
/
An introduction to parallel algorithms

Book•

An introduction to parallel algorithms

Joseph JaJa¹•Institutions (1)

01 Oct 1992-

TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.

read less

Abstract: Written by an authority in the field, this book provides an introduction to the design and analysis of parallel algorithms. The emphasis is on the application of the PRAM (parallel random access machine) model of parallel computation, with all its variants, to algorithm analysis. Special attention is given to the selection of relevant data structures and to algorithm design principles that have proved to be useful. Features *Uses PRAM (parallel random access machine) as the model for parallel computation. *Covers all essential classes of parallel algorithms. *Rich exercise sets. *Written by a highly respected author within the field. 0201548569B04062001

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

REPLICA MBTAC: multithreaded dual-mode processor

[...]

Martti Forsell, Jussi Roivainen, Ville Leppänen¹•Institutions (1)

Information Technology University¹

01 May 2018-The Journal of Supercomputing

TL;DR: This paper introduces the new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP, and provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts.

...read moreread less

Abstract: Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.

...read moreread less

2 citations

Cites methods from "An introduction to parallel algorit..."

...Unlike for current processors, the proposed processor will execute efficiently synchronous algorithms with non-locality-aware memory allocation [13]....
[...]

Distributed algorithms for overlay networks and programmable matter

[...]

Robert Gmyr

01 Jan 2018

TL;DR: This dissertation introduces network protocols that maintain the connectivity of an overlay network under massive adversarial churn or denial-of-service attacks, presents a self-stabilizing algorithm for the construction of metric graphs, and begins the study of hybrid networks by investigating the problem of continuously monitoring properties of an externally-controlled network with the help of an overlayer network.

...read moreread less

Abstract: This dissertation consists of two parts that are dedicated to the study of distributed algorithms for overlay networks and programmable matter. The first part revolves around the topics of robustness against attacks, recovery from faults, and monitoring network properties in the context of overlay networks. More specifically, we introduce network protocols that maintain the connectivity of an overlay network under massive adversarial churn or denial-of-service attacks, we present a self-stabilizing algorithm for the construction of metric graphs, and we initiate the study of hybrid networks by investigating the problem of continuously monitoring properties of an externally-controlled network with the help of an overlay network. In the second part we investigate the algorithmic foundations of programmable matter. Programmable matter refers to a substance that can change its shape or other physical properties in a programmable fashion. We envision programmable matter consisting of simple computational devices that are able to self-organize in order to achieve a collective goal without any central control or external intervention. We present efficient algorithms for the fundamental problems of leader election and shape formation for programmable matter.

...read moreread less

2 citations

Cites methods from "An introduction to parallel algorit..."

...Pointer jumping (which is also known as pointer doubling or simply the doubling technique) is a well-known technique in the area of parallel computing [JáJ92]....
[...]
...For example, we make extensive use of pointer jumping [JáJ92], a technique that is often used in algorithms for parallel random-...
[...]
..., a node introduces its neighbors to its neighbors) is a well-known technique in the area of parallel computing [JáJ92], but to our great surprise, it seems that it has never been combined with random walks so far....
[...]

Journal Article•DOI•

Verschachtelter Datenparallelismus – Ein einführender Überblick

[...]

Wolf Pfannenstiel¹•Institutions (1)

Technical University of Berlin¹

01 Dec 1999-Informatik - Forschung Und Entwicklung

TL;DR: The nested data-parallel programming model has some of the desired properties of a parallel programming model and with this model it is possible to express irregular data structures and irregular parallel computations directly.

...read moreread less

Abstract: Datenparallele Programmiermodelle sind derzeit die erfolgreichsten Programmiermodelle fur Parallelrechner, sowohl was die Effizienz der Ausfuhrung als auch die Komplexitat der Programmierung angeht Es ist bisher jedoch nicht gelungen, ein paralleles Programmiermodell zu entwickeln, das sowohl konzeptionell einfach und abstrakt ist, als auch effizient auf die Fulle paralleler Rechnerarchitekturen zu ubertragen ist Das verschachtelt datenparallele Programmiermodell besitzt einige der gewunschten Eigenschaften eines parallelen Programmiermodells Im Gegensatz zu den gangigen flach datenparallelen Modellen gibt es in verschachtelt datenparallelen Modellen Mechanismen, die irregulare Datenstrukturen und Berechnungen direkt unterstutzen Dieser Artikel stellt einen kollektionsbasierten Ansatz fur verschachtelten Datenparallelismus vor, gibt einen Uberblick uber den Stand der Forschung und zeigt offene Fragen in diesem Gebiet auf

...read moreread less

2 citations

Cites background from "An introduction to parallel algorit..."

...Es existiert eine Reihe von verschachtelt datenparallelen Programmiersprachen, die wegen ihrer Abstraktheit, Architekturunabhängigkeit und des dazugeḧorigen abstrakten Kostenkalküls auch zum Prototyping paralleler Algorithmen und als Lehrsprache eingesetzt werden [31, 6]....
[...]
...Das verschachtelt datenparallele Modell ist in dieser Hinsicht allgemeiner, denn durch die beliebige Verschachtelung von parallelen Berechnungen kö nen irregul̈are Algorithmen leicht formuliert werden....
[...]
...Die abstrakten Maße Arbeit ( W ) und Tiefe (D) sind übliche Aufwandsgr̈ oßen von parallelen Algorithmen [22]....
[...]
...Die Klasse der Teile&Herrsche-Algorithmen beispielsweise l̈aßt sich erst verschachtelt datenparallel elegant ausdr̈ucken, da sich sowohl Parallelität innerhalb der Funktionen (zum Aufteilen der Eingabedaten und zum Zusammenf̈ugen der Teilergebnisse) ausnutzen läßt und auch die rekursiven Aufrufe parallel zueinander ausgeführt werden k̈onnen....
[...]

Journal Article•DOI•

Parallel Output-Sensitive Algorithms for Combinatorial and Linear Algebra Problems

[...]

John H. Reif¹•Institutions (1)

Duke University¹

01 May 2001-Journal of Computer and System Sciences

TL;DR: Output-sensitive parallel algorithms whose performance depends on the output size and are significantly more efficient tan previous algorithms for problems with sufficiently small output size are given.

...read moreread less

2 citations

Cites methods from "An introduction to parallel algorit..."

...using nlog n processors, by the algorithm of Ladner and Fischer [1] (also see Reif [29] and Ja Ja [ 13 ])....
[...]

Proceedings Article•

Parallel algorithms on strings

[...]

Wojciech Rytter¹•Institutions (1)

University of Warsaw¹

01 Jan 2007

TL;DR: Wojciech Rytter et al. as discussed by the authors proposed a method to solve the problem of plagiarism in literature, and proposed a novel approach. 30.10.2018

...read moreread less

Abstract: Wojciech Rytter Warsaw University 30.

...read moreread less

2 citations

Cites background from "An introduction to parallel algorit..."

...Example For our example string x = babaabababba# we have SUFZ = [2, 5, 9, 12, 3, 6, 8, 11]....
[...]
...Let us consider SUF[6] = 1, and the branch from the root to 1 is a separator of the suffix tree, it separates the tree into two subtrees with a common branch: the first subtree is for the first half α1 = [12, 11, 3, 6, 9, 1] and the second subtree for α2 = [1, 4, 7, 10, 2, 5, 8]....
[...]
...For this string we have: SUF = [4, 2, 5, 7, 9, 12, 3, 1, 6, 8, 11, 10] LCP = [0 , 1, 3, 4, 2, 1, 0, 2, 4, 3, 2, 1]...
[...]

Collapse

References

PDF

Open Access

More filters

Book•

Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes

[...]

F. Thomson Leighton

01 Sep 1991

TL;DR: This chapter discusses sorting on a Linear Array with a Systolic and Semisystolic Model of Computation, which automates the very labor-intensive and therefore time-heavy and expensive process of manually sorting arrays.

...read moreread less

Abstract: Preface Acknowledgments Notation 1 Arrays and Trees 1.1 Elementary Sorting and Counting 1.1.1 Sorting on a Linear Array Assessing the Performance of the Algorithm Sorting N Numbers with Fewer Than N Processors 1.1.2 Sorting in the Bit Model 1.1.3 Lower Bounds 1.1.4 A Counterexample-Counting 1.1.5 Properties of the Fixed-Connection Network Model 1.2 Integer Arithmetic 1.2.1 Carry-Lookahead Addition 1.2.2 Prefix Computations-Segmented Prefix Computations 1.2.3 Carry-Save Addition 1.2.4 Multiplication and Convolution 1.2.5 Division and Newton Iteration 1.3 Matrix Algorithms 1.3.1 Elementary Matrix Products 1.3.2 Algorithms for Triangular Matrices 1.3.3 Algorithms for Tridiagonal Matrices -Odd-Even Reduction -Parallel Prefix Algorithms 1.3.4 Gaussian Elimination 1.3.5 Iterative Methods -Jacobi Relaxation -Gauss-Seidel Relaxation Finite Difference Methods -Multigrid Methods 1.4 Retiming and Systolic Conversion 1.4.1 A Motivating Example-Palindrome Recognition 1.4.2 The Systolic and Semisystolic Model of Computation 1.4.3 Retiming Semisystolic Networks 1.4.4 Conversion of a Semisystolic Network into a Systolic Network 1.4.5 The Special Case of Broadcasting 1.4.6 Retiming the Host 1.4.7 Design by Systolic Conversion-A Summary 1.5 Graph Algorithms 1.5.1 Transitive Closure 1.5.2 Connected Components 1.5.3 Shortest Paths 1.5.4 Breadth-First Spanning Trees 1.5.5 Minimum Weight Spanning Trees 1.6 Sorting Revisited 1.6.1 Odd-Even Transposition Sort on a Linear Array 1.6.2 A Simple Root-N(log N + 1)-Step Sorting Algorithm 1.6.3 A (3 Root- N + o(Root-N))-Step Sorting Algorithm 1.6.4 A Matching Lower Bound 1.7 Packet Routing 1.7.1 Greedy Algorithms 1.7.2 Average-Case Analysis of Greedy Algorithms -Routing N Packets to Random Destinations -Analysis of Dynamic Routing Problems 1.7.3 Randomized Routing Algorithms 1.7.4 Deterministic Algorithms with Small Queues 1.7.5 An Off-line Algorithm 1.7.6 Other Routing Models and Algorithms 1.8 Image Analysis and Computational Geometry 1.8.1 Component-Labelling Algorithms -Levialdi's Algorithm -An O (Root-N)-Step Recursive Algorithm 1.8.2 Computing Hough Transforms 1.8.3 Nearest-Neighbor Algorithms 1.8.4 Finding Convex Hulls 1.9 Higher-Dimensional Arrays 1.9.1 Definitions and Properties 1.9.2 Matrix Multiplication 1.9.3 Sorting 1.9.4 Packet Routing 1.9.5 Simulating High-Dimensional Arrays on Low-Dimensional Arrays 1.10 problems 1.11 Bibliographic Notes 2 Meshes of Trees 2.1 The Two-Dimensional Mesh of Trees 2.1.1 Definition and Properties 2.1.2 Recursive Decomposition 2.1.3 Derivation from KN,N 2.1.4 Variations 2.1.5 Comparison With the Pyramid and Multigrid 2.2 Elementary O(log N)-Step Algorithms 2.2.1 Routing 2.2.2 Sorting 2.2.3 Matrix-Vector Multiplication 2.2.4 Jacobi Relaxation 2.2.5 Pivoting 2.2.6 Convolution 2.2.7 Convex Hull 2.3 Integer Arithmetic 2.3.1 Multiplication 2.3.2 Division and Chinese Remaindering 2.3.3 Related Problems -Iterated Products -Rooting Finding 2.4 Matrix Algorithms 2.4.1 The Three-Dimensional Mesh of Trees 2.4.2 Matrix Multiplication 2.4.3 Inverting Lower Triangular Matrices 2.4.4 Inverting Arbitrary Matrices -Csanky's Algorithm -Inversion by Newton Iteration 2.4.5 Related Problems 2.5 Graph Algorithms 2.5.1 Minimum-Weight Spanning Trees 2.5.2 Connected Components 2.5.3 Transitive Closure 2.5.4 Shortest Paths 2.5.5 Matching Problems 2.6 Fast Evaluation of Straight-Line Code 2.6.1 Addition and Multiplication Over a Semiring 2.6.2 Extension to Codes with Subtraction and Division 2.6.3 Applications 2.7 Higher-Dimensional meshes of Trees 2.7.1 Definitions and Properties 2.7.2 The Shuffle-Tree Graph 2.8 Problems 2.9 Bibliographic Notes 3 Hypercubes and Related Networks 3.1 The Hypercube 3.1.1 Definitions and Properties 3.1.2 Containment of Arrays -Higher-Dimensional Arrays -Non-Power-of-2 Arrays 3.1.3 Containment of Complete Binary Trees 3.1.4 Embeddings of Arbitrary Binary Trees -Embeddings with Dilation 1 and Load O(M over N + log N) -Embeddings with Dilation O(1) and Load O (M over N + 1) -A Review of One-Error-Correcting Codes -Embedding Plog N into Hlog N 3.1.5 Containment of Meshes of Trees 3.1.6 Other Containment Results 3.2 The Butterfly, Cube-Connected-Cycles , and Benes Network 3.2.1 Definitions and Properties 3.2.2 Simulation of Arbitrary Networks 3.2.3 Simulation of Normal Hypercube Algorithms 3.2.4 Some Containment and Simulation Results 3.3 The Shuffle-Exchange and de Bruijn Graphs 3.3.1 Definitions and Properties 3.3.2 The Diaconis Card Tricks 3.3.3 Simulation of Normal Hypercube Algorithms 3.3.4 Similarities with the Butterfly 3.3.5 Some Containment and Simulation Results 3.4 Packet-Routing Algorithms 3.4.1 Definitions and Routing Models 3.4.2 Greedy Routing Algorithms and Worst-Case Problems 3.4.3 Packing, Spreading, and Monotone Routing Problems -Reducing a Many-to-Many Routing Problem to a Many-to-One Routing Problem -Reducing a Routing Problem to a Sorting Problem 3.4.4 The Average-Case Behavior of the Greedy Algorithm -Bounds on Congestion -Bounds on Running Time -Analyzing Non-Predictive Contention-Resolution Protocols 3.4.5 Converting Worst-Case Routing Problems into Average-Case Routing Problems -Hashing -Randomized Routing 3.4.6 Bounding Queue Sizes -Routing on Arbitrary Levelled Networks 3.4.7 Routing with Combining 3.4.8 The Information Dispersal Approach to Routing -Using Information Dispersal to Attain Fault-Tolerance -Finite Fields and Coding Theory 3.4.9 Circuit-Switching Algorithms 3.5 Sorting 3.5.1 Odd-Even Merge Sort -Constructing a Sorting Circuit with Depth log N(log N +1)/2 3.5.2 Sorting Small Sets 3.5.3 A Deterministic O(log N log log N)-Step Sorting Algorithm 3.5.4 Randomized O(log N)-Step Sorting Algorithms -A Circuit with Depth 7.45 log N that Usually Sorts 3.6 Simulating a Parallel Random Access Machine 3.6.1 PRAM Models and Shared Memories 3.6.2 Randomized Simulations Based on Hashing 3.6.3 Deterministic Simulations using Replicated Data 3.6.4 Using Information Dispersal to Improve Performance 3.7 The Fast Fourier Transform 3.7.1 The Algorithm 3.7.2 Implementation on the Butterfly and Shuffle-Exchange Graph 3.7.3 Application to Convolution and Polynomial Arithmetic 3.7.4 Application to Integer Multiplication 3.8 Other Hypercubic Networks 3.8.1 Butterflylike Networks -The Omega Network -The Flip Network -The Baseline and Reverse Baseline Networks -Banyan and Delta Networks -k-ary Butterflies 3.8.2 De Bruijn-Type Networks -The k-ary de Bruijn Graph -The Generalized Shuffle-Exchange Graph 3.9 Problems 3.10 Bibliographic Notes Bibliography Index Lemmas, Theorems, and Corollaries Author Index Subject Index

...read moreread less

2,895 citations

"An introduction to parallel algorit..." refers background in this paper

...Multiprocessorbased computers have been around for decades and various types of computer architectures [2] have been implemented in hardware throughout the years with different types of advantages/performance gains depending on the application....
[...]
...Every location in the array represents a node of the tree: T [1] is the root, with children at T [2] and T [3]....
[...]
...The text by [2] is a good start as it contains a comprehensive description of algorithms and different architecture topologies for the network model (tree, hypercube, mesh, and butterfly)....
[...]

Book•

Computer Architecture and Parallel Processing

[...]

Kai Hwang, Faye A. Briggs

01 Jan 1984

TL;DR: The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.

...read moreread less

Abstract: The book is intended as a text to support two semesters of courses in computer architecture at the college senior and graduate levels. There are excellent problems for students at the end of each chapter. The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.

...read moreread less

1,410 citations

"An introduction to parallel algorit..." refers background in this paper

...Parallel architectures have been described in several books (see, for example, [18, 29])....
[...]

Journal Article•DOI•

Data parallel algorithms

[...]

W. Daniel Hillis, Guy L. Steele

01 Dec 1986-Communications of The ACM

TL;DR: The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.

...read moreread less

Abstract: Parallel computers with tens of thousands of processors are typically programmed in a data parallel style, as opposed to the control parallel style used in multiprocessing. The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.

...read moreread less

1,000 citations

"An introduction to parallel algorit..." refers background in this paper

...Recent work on the mapping of PRAM algorithms on bounded-degree networks is described in [3,13,14, 20, 25], Our presentation on the communication complexity of the matrix-multiplication problem in the sharedmemory model is taken from [1], Data-parallel algorithms are described in [15]....
[...]

Proceedings Article•DOI•

Parallelism in random access machines

[...]

Steven Fortune, James C. Wyllie

01 May 1978

TL;DR: A model of computation based on random access machines operating in parallel and sharing a common memory is presented and can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines.

...read moreread less

Abstract: A model of computation based on random access machines operating in parallel and sharing a common memory is presented. The computational power of this model is related to that of traditional models. In particular, deterministic parallel RAM's can accept in polynomial time exactly the sets accepted by polynomial tape bounded Turing machines; nondeterministic RAM's can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines. Similar results hold for other classes. The effect of limiting the size of the common memory is also considered.

...read moreread less

951 citations

"An introduction to parallel algorit..." refers background in this paper

...Rigorous descriptions of shared-memory models were introduced later in [11,12]....
[...]

Journal Article•DOI•

The Parallel Evaluation of General Arithmetic Expressions

[...]

Richard P. Brent¹•Institutions (1)

Australian National University¹

01 Apr 1974-Journal of the ACM

TL;DR: It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log 2 + 10(n - 1) using processors which can independently perform arithmetic operations in unit time.

...read moreread less

Abstract: It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log2n + 10(n - 1)/p using p ≥ 1 processors which can independently perform arithmetic operations in unit time. This bound is within a constant factor of the best possible. A sharper result is given for expressions without the division operation, and the question of numerical stability is discussed.

...read moreread less

864 citations

"An introduction to parallel algorit..." refers methods in this paper

...The WT scheduling principle is derived from a theorem in [7], In the literature, this principle is commonly referred to as Brent's theorem or Brent's scheduling principle....
[...]