scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The power of parallel prefix

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.
Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).
Citations
More filters
Journal ArticleDOI
TL;DR: The size-time complexity of computing prefixes with Boolean networks, which are synchronized interconnections of Boolean gates and one-bit storage devices, is characterized and area-time optimal circuits are obtained for both boundary and nonboundary I/O protocols.
Abstract: The prefix problem consists of computing all the products x0x1 … xj (j = 0, … , N - 1), given a sequence x = (x0, x1, … , xN-1) of elements in a semigroup. In this paper we completely characterize the size-time complexity of computing prefixes with Boolean networks, which are synchronized interconnections of Boolean gates and one-bit storage devices. This complexity crucially depends upon two properties of the underlying semigroup, which we call cycle-freedom (no cycle of length greater than one in the Cayley graph of the semigroup), and memory-induciveness (arbitrarily long products of semigroup elements are true functions of all their factors). A nontrivial characterization is given of non-memory-inducive semigroups as those whose recurrent subsemigroup (formed by the elements with self-loops in the Cayley graph) is the direct product of a left-zero semigroup and a right-zero semigroup. Denoting by S and T size and computation time, respectively, we have S = T((N/T)log(N/T)) for memory-inducive non-cycle-free semigroups, and S = T(N/T) for all other semigroups. We have T e [O(log N), O(N)] for all semigroups, with the exception of those whose recurrent subsemigroup is a right-zero semigroup, for which T e [O(1), O(N)]. The preceding results are also extended to the VLSI model of computation. Area-time optimal circuits are obtained for both boundary and nonboundary I/O protocols.

16 citations


Cites background from "The power of parallel prefix"

  • ...Prefix computations occur in the solution of several significant problems such as carry-look-ahead addition [7,2 I], the evolution of finite-state machines [ 191, linear recurrences [ 181, digital filtering [5], various graph problems [ 17 ], sorting in bitmodels of computation [3, 111, scheduling [ 131, and others....

    [...]

Proceedings ArticleDOI
30 Oct 1989
TL;DR: The problem of simulating a parallel random-access machine (PRAM) with n processors and memory size m>or=n on an n-node bounded degree network (BDN) is considered, and a deterministic solution to the simulation problem is presented.
Abstract: The problem of simulating a parallel random-access machine (PRAM) with n processors and memory size m>or=n on an n-node bounded degree network (BDN) is considered. Since many of the more efficient PRAM algorithms use an amount of shared memory not much larger than the number of processors, the case in which m=o(n/sup 1+ epsilon /) is considered, and a deterministic solution to the simulation problem is presented. For m=n(log n)/sup /O/sup (1)/ the running time of O(log n log log n) is within a factor of O(log log n) of the lower bound imposed by the diameter of the network. >

16 citations

Journal ArticleDOI
TL;DR: This work describes ann-processor,O(log(n) log log log(n))-time CRCW algorithm to construct the Voronoi diagram for a set of point-sites in the plane.
Abstract: We describe ann-processor,O(log(n) log log(n))-time CRCW algorithm to construct the Voronoi diagram for a set ofn point-sites in the plane.

16 citations

Book ChapterDOI
01 Jan 2000
TL;DR: This work describes general methods for designing deterministic parallel algorithms in computational geometry and focuses on techniques for shared-memory parallel machines.
Abstract: We describe general methods for designing deterministic parallel algorithms in computational geometry. We focus on techniques for shared-memory parallel machines, which we describe and illustrate with examples. We also discuss some open problems in this area.

16 citations