The power of parallel prefix

doi:10.1109/TC.1985.6312202

Journal Article•DOI•

The power of parallel prefix

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Carnegie Mellon University², Hebrew University of Jerusalem³

01 Oct 1985-IEEE Transactions on Computers (IEEE)-Vol. 34, Iss: 10, pp 965-968

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.

read less

Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A complexity theory of efficient parallel algorithms

[...]

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Maryland, College Park¹, Hebrew University of Jerusalem², IBM³

13 Mar 1990

TL;DR: The relationship between various models of parallel computation is investigated, using a newly defined concept of efficient simulation, and it is proved that the class PE is invariant across the shared memory models (PRAM's) and fully connected message passing machines.

...read moreread less

Abstract: Theoretical research on parallel algorithms has focused on NC theory. This motivates the development of parallel algorithms that are extremely fast, but possibly wasteful in their use of processors. Such algorithms seem of limited interest for real applications currently run on parallel computers. This paper explores an alternative approach that emphasizes the efficiency of parallel algorithms. We define a complexity class PE of problems that can be solved by parallel algorithms that are efficient (the speedup is proportional to the number of processors used) and polynomially faster than sequential algorithms. Other complexity classes are also defined, in terms of time and efficiency: A class that has a slightly weaker efficiency requirement than PE, and a class that is a natural generalization of NC. We investigate the relationship between various models of parallel computation, using a newly defined concept of efficient simulation. This includes new models that reflect asynchrony and high communication latency in parallel computers. We prove that the class PE is invariant across the shared memory models (PRAM's) and fully connected message passing machines. These results show that our definitions are robust. Many open problems motivated by our approach are listed.

...read moreread less

244 citations

Book Chapter•DOI•

Color Set Size Problem with Application to String Matching

[...]

Lucas Chi Kwong Hui¹•Institutions (1)

University of California, Davis¹

29 Apr 1992

TL;DR: An optimal sequential solution of the color set size problem and string matching applications including a linear time algorithm for the problem of finding the longest substring common to at least k out of m input strings for all k between 1 and m is given.

...read moreread less

Abstract: The Color Set Size problem is: Given a rooted tree of size n with l leaves colored from 1 to m, m ≤ l, for each vertex u find the number of different leaf colors in the subtree rooted at u. This problem formulation, together with the Generalized Suffix Tree data structure has applications to string matching. This paper gives an optimal sequential solution of the color set size problem and string matching applications including a linear time algorithm for the problem of finding the longest substring common to at least k out of m input strings for all k between 1 and m. In addition, parallel solutions to the above problems are given. These solutions may shed light on problems in computational biology, such as the multiple string alignment problem.

...read moreread less

200 citations

Cites methods from "The power of parallel prefix"

...A parallel bucket sort can be computed: (i) using ~ processors, EO(logn) parallel expected time on a priority CRCW PRAM [18] (we will discuss how this is achieved in the next paragraph); (ii) using ~ loglogn processors, O(logn) time on a priority CRCW by the algorithm of [9]; (iii) or using n 1-~ processors, O(n ~) time on an EI~EW PRAM for any c > 0 [ 10 ];...
[...]
...We use parallel sorting [3, 5, 9, 10 ] and parallel prefix sum to implement the other...
[...]

Journal Article•DOI•

Stochastic image pyramids

[...]

Peter Meer¹•Institutions (1)

University of Maryland, College Park¹

01 Mar 1989-Graphical Models \/graphical Models and Image Processing \/computer Vision, Graphics, and Image Processing

TL;DR: A new class of image pyramids is introduced in which a global sampling structure close to that of the twofold reduced resolution next level is generated exclusively by local processes and the probabilistic algorithm exploits local ordering relations among independent identically distributed random variables.

...read moreread less

Abstract: A new class of image pyramids is introduced in which a global sampling structure close to that of the twofold reduced resolution next level is generated exclusively by local processes. The probabilistic algorithm exploits local ordering relations among independent identically distributed random variables. The algorithm is superior to any coin tossing based procudure and converges to an optimal sampling structure in only three steps. It can be applied to either 1- or 2-dimensional lattices. Generation of stochastic pyramids has broad applicability. We discuss in detail curve processing in 2-dimensional image pyramids and labeling the mesh in massively parallel computers. We also mention investigation of the robustness of multiresolution algorithms and a fast parallel synthesis method for nonhomogeneous anisotropic random patterns.

...read moreread less

162 citations

Journal Article•DOI•

Efficient parallel algorithms for graph problems

[...]

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Maryland, College Park¹, Hebrew University of Jerusalem², IBM³

01 Mar 1990-Algorithmica

TL;DR: An efficient technique for parallel manipulation of data structures that avoids memory access conflicts is presented and is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range.

...read moreread less

Abstract: We present an efficient technique for parallel manipulation of data structures that avoids memory access conflicts. That is, this technique works on the Exclusive Read/Exclusive Write (EREW) model of computation, which is the weakest shared memory, MIMD machine model. It is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range. Using the radix sort and known results for parallel prefix on linked lists, we develop parallel algorithms that efficiently solve various computations on trees and “unicycular graphs.” Finally, we develop parallel algorithms for connected components, spanning trees, minimum spanning trees, and other graph problems. All of the graph algorithms achieve linear speedup for all but the sparsest graphs.

...read moreread less

100 citations

Journal Article•DOI•

Algorithms for unboundedly parallel simulations

[...]

Albert G. Greenberg¹, Boris Dmitrievich Lubachevsky², Isi Mitrani•Institutions (2)

Bell Labs¹, University of Newcastle²

01 Aug 1991-ACM Transactions on Computer Systems

TL;DR: Efficient parallel simulations are given for a variety of queueing networks having a global first come first served structure, and the problem of simulating the arrival and departure times for the first N jobs to a single G/G/l queue is solved in time proportional to N/P + log P using P processors.

...read moreread less

Abstract: New methods are presented for parallel simulation of discrete event systems that, when applicable, can usefully employ a number of processors much larger than the number of objects in the system being simulated, Abandoning the distributed event list approach, the simulation problem is posed using recurrence relations. We bring three algorithmic ideas to bear on parallel simulation: parallel prefix computation, parallel merging, and iterative folding. Efficient parallel simulations are given for (in turn) the G/G/l queue, a variety of queueing networks having a global first come first served structure (e.g., a series of queues with finite buffers), acyclic networks of queues, and networks of queues with feedbacks and cycles. In particular, the problem of simulating the arrival and departure times for the first N jobs to a single G/G/l queue is solved in time proportional to N/P + log P using P processors.

...read moreread less

81 citations

Collapse

The power of parallel prefix

Citations

Cites methods from "The power of parallel prefix"

Related Papers (5)