scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The power of parallel prefix

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.
Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).
Citations
More filters
Journal ArticleDOI
13 Mar 1990
TL;DR: The relationship between various models of parallel computation is investigated, using a newly defined concept of efficient simulation, and it is proved that the class PE is invariant across the shared memory models (PRAM's) and fully connected message passing machines.
Abstract: Theoretical research on parallel algorithms has focused on NC theory. This motivates the development of parallel algorithms that are extremely fast, but possibly wasteful in their use of processors. Such algorithms seem of limited interest for real applications currently run on parallel computers. This paper explores an alternative approach that emphasizes the efficiency of parallel algorithms. We define a complexity class PE of problems that can be solved by parallel algorithms that are efficient (the speedup is proportional to the number of processors used) and polynomially faster than sequential algorithms. Other complexity classes are also defined, in terms of time and efficiency: A class that has a slightly weaker efficiency requirement than PE, and a class that is a natural generalization of NC. We investigate the relationship between various models of parallel computation, using a newly defined concept of efficient simulation. This includes new models that reflect asynchrony and high communication latency in parallel computers. We prove that the class PE is invariant across the shared memory models (PRAM's) and fully connected message passing machines. These results show that our definitions are robust. Many open problems motivated by our approach are listed.

244 citations

Book ChapterDOI
29 Apr 1992
TL;DR: An optimal sequential solution of the color set size problem and string matching applications including a linear time algorithm for the problem of finding the longest substring common to at least k out of m input strings for all k between 1 and m is given.
Abstract: The Color Set Size problem is: Given a rooted tree of size n with l leaves colored from 1 to m, m ≤ l, for each vertex u find the number of different leaf colors in the subtree rooted at u. This problem formulation, together with the Generalized Suffix Tree data structure has applications to string matching. This paper gives an optimal sequential solution of the color set size problem and string matching applications including a linear time algorithm for the problem of finding the longest substring common to at least k out of m input strings for all k between 1 and m. In addition, parallel solutions to the above problems are given. These solutions may shed light on problems in computational biology, such as the multiple string alignment problem.

200 citations


Cites methods from "The power of parallel prefix"

  • ...A parallel bucket sort can be computed: (i) using ~ processors, EO(logn) parallel expected time on a priority CRCW PRAM [18] (we will discuss how this is achieved in the next paragraph); (ii) using ~ loglogn processors, O(logn) time on a priority CRCW by the algorithm of [9]; (iii) or using n 1-~ processors, O(n ~) time on an EI~EW PRAM for any c > 0 [ 10 ];...

    [...]

  • ...We use parallel sorting [3, 5, 9, 10 ] and parallel prefix sum to implement the other...

    [...]

Journal ArticleDOI
TL;DR: A new class of image pyramids is introduced in which a global sampling structure close to that of the twofold reduced resolution next level is generated exclusively by local processes and the probabilistic algorithm exploits local ordering relations among independent identically distributed random variables.
Abstract: A new class of image pyramids is introduced in which a global sampling structure close to that of the twofold reduced resolution next level is generated exclusively by local processes. The probabilistic algorithm exploits local ordering relations among independent identically distributed random variables. The algorithm is superior to any coin tossing based procudure and converges to an optimal sampling structure in only three steps. It can be applied to either 1- or 2-dimensional lattices. Generation of stochastic pyramids has broad applicability. We discuss in detail curve processing in 2-dimensional image pyramids and labeling the mesh in massively parallel computers. We also mention investigation of the robustness of multiresolution algorithms and a fast parallel synthesis method for nonhomogeneous anisotropic random patterns.

162 citations

Journal ArticleDOI
TL;DR: An efficient technique for parallel manipulation of data structures that avoids memory access conflicts is presented and is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range.
Abstract: We present an efficient technique for parallel manipulation of data structures that avoids memory access conflicts. That is, this technique works on the Exclusive Read/Exclusive Write (EREW) model of computation, which is the weakest shared memory, MIMD machine model. It is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range. Using the radix sort and known results for parallel prefix on linked lists, we develop parallel algorithms that efficiently solve various computations on trees and “unicycular graphs.” Finally, we develop parallel algorithms for connected components, spanning trees, minimum spanning trees, and other graph problems. All of the graph algorithms achieve linear speedup for all but the sparsest graphs.

100 citations

Journal ArticleDOI
TL;DR: Efficient parallel simulations are given for a variety of queueing networks having a global first come first served structure, and the problem of simulating the arrival and departure times for the first N jobs to a single G/G/l queue is solved in time proportional to N/P + log P using P processors.
Abstract: New methods are presented for parallel simulation of discrete event systems that, when applicable, can usefully employ a number of processors much larger than the number of objects in the system being simulated, Abandoning the distributed event list approach, the simulation problem is posed using recurrence relations. We bring three algorithmic ideas to bear on parallel simulation: parallel prefix computation, parallel merging, and iterative folding. Efficient parallel simulations are given for (in turn) the G/G/l queue, a variety of queueing networks having a global first come first served structure (e.g., a series of queues with finite buffers), acyclic networks of queues, and networks of queues with feedbacks and cycles. In particular, the problem of simulating the arrival and departure times for the first N jobs to a single G/G/l queue is solved in time proportional to N/P + log P using P processors.

81 citations