scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The power of parallel prefix

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.
Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).
Citations
More filters
Journal ArticleDOI
TL;DR: New techniques are presented for the manipulation of sparse matrices on parallel MIMD computers that consider the following problems: matrix addition, matrix multiplication, row and column permutation, matrix transpose, matrix vector multiplication, and Gaussian elimination.

26 citations


Cites background from "The power of parallel prefix"

  • ...Conversely, a canonical representation can be converted into a full matricial form in time 0( m/p) if the matrix is already initialized to zero, and time O(qr/p) otherwise....

    [...]

Journal ArticleDOI
TL;DR: This work proposes a model of parallel computation, the YPRAM, that allows general parallel algorithms to be designed for a wide class of parallel models, and shows that this model predicts, reasonably accurately, the actual known performances of several basic parallel models when solving these problems.

26 citations

Journal ArticleDOI
TL;DR: This paper shows how the problem of computed sums of the forma0+a1+...+ai, fori=0, 1,...,n−1 can be solved on a simple network, namely abinary tree of processors and shows how to extend the solution to obtain an optimal-cost algorithm.
Abstract: Givenn numbersa0,a1,...,an−1, it is required to compute all sums of the forma0+a1+...+ai, fori=0, 1,...,n−1. This problem arises in many applications and is trivial to solve sequentially in O(n) time. Besides its practical importance, the problem gains an additional theoretical interest in parallel computation. A technique known asrecursive doubling allows all sums to be computed in O(logn) time on a model of computation wheren processors communicate through aninverse perfect suffle interconnection network. In this paper we show how the problem can be solved on a simple network, namely abinary tree of processors. In addition, we show how to extend our solution to obtain an optimal-cost algorithm. The algorithm usesp processors and runs in O((n/p)+logp) time, for a cost of O(n+p logp). This cost is optimal whenp logp=O(n). Finally, two applications of our results are illustrated, namely job scheduling with deadlines and the knapsack problem.

26 citations

Journal ArticleDOI
Eunice E. Santos1
TL;DR: The problem of designing efficient parallel algorithms for summing and prefix summing for certain classes of the LogP model is studied and it is shown that any optimal summing algorithm must have a certain inherent structure.

26 citations

Journal ArticleDOI
TL;DR: In this article, the authors presented a parallel algorithm for computing the visible portion of a simple polygonal chain with 7i vertices from a point in the plane in O(logn) time using 0{nf log n) processors in the CREW-PRAM computational model.
Abstract: We present a parallel algorithm for computing the visible portion of a simple polygonal chain with 7i vertices from a point in the plane. The algorithm runs in O(logn) time using 0{nf log n) processors in the CREW-PRAM computational model, and hence is asymptotically optimal.

26 citations