scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The power of parallel prefix

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.
Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).
Citations
More filters
Journal ArticleDOI
TL;DR: Results show that linear speedup can be obtained for up to up to p ≤ e/p/log e/log 2-supscrpt/log n-n processors when graphs satisfying e/e ≥ n, and for graphs satisfying ≥ n-log log, if a more efficient integer sorting algorithm is available.
Abstract: A parallel algorithm for computing the connected components of undirected graphs is presented. Shared memory computation models are assumed. For a graph of e edges and n nodes, the time complexity of the algorithm is O(e/p + (n log n)/p + log2n) with p processors. The algorithm can be further refined to yield time complexity O(H(e, n, p)/p + (n log n)/(p log(n/p)) + log2n), where H(e, n, p) is very close to O(e). These results show that linear speedup can be obtained for up to p ≤ e/log2n processors when e ≥ n log n. Linear speedup can still be achieved with up to p ≤ ne processors, 0 ≤ e

70 citations

Journal ArticleDOI
TL;DR: A simple deterministic parallel algorithm that runs on a CRCW PRAM and sorts n integers of size polynomial in n in time O(log n) using O(n log log n Log n) processors is presented.
Abstract: We present a simple deterministic parallel algorithm that runs on a CRCW PRAM and sorts n integers of size polynomial in n in time O(log n) using O(n log log nlog n) processors. It is closer to optimality than any previously known deterministic algorithm that solves the stated restricted sorting problem in polylog time.

65 citations


Cites methods from "The power of parallel prefix"

  • ...…difficult to design such algorithms for m = (log n)O(‘) (Cole and Vishkin, 1986a; Reif, 1985) and, in general, optimal algorithms with a running time of O(m’+ log n) for fixed E > 0 (Kruskal et al., 1985), the case *This work was supported by the Deutsche Forschungsgemeinschaft, SFB 124, TP B 2....

    [...]

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The Connection Machine® computer system supports a data parallel programming style, making it a natural target architecture for Fortran 8x array constructs, and the CM Fortran compiler minimizes data motion for aligned array operations, minimizes transfers between the Connection Machine and the VAX and minimizes context switching for masked computations.
Abstract: The Connection Machine® computer system supports a data parallel programming style, making it a natural target architecture for Fortran 8x array constructs. The Connection Machine Fortran compiler generates VAX code that performs scalar operations and directs the Connection Machine to perform array operations. The Connection Machine virtual processor mechanism supports elemental operations on very large arrays. Most array operators and intrinsic functions map into single instructions or short instruction sequences. Noncontiguous array sections, array-valued subscripts, and parallel constructs such as WHERE and FORALL are also readily accommodated on the Connection Machine. In addition to such customary optimizations as common subexpression elimination, the CM Fortran compiler minimizes data motion for aligning array operations, minimizes transfers between the Connection Machine and the VAX and minimizes context switching for masked computations.

61 citations


Cites background from "The power of parallel prefix"

  • ...as add, rtax, or logior) and performs a parallel prefir computation [ 12 ,10]....

    [...]

Journal ArticleDOI
01 Jan 1988
TL;DR: It appears that as long as PRAMs cannot achieve the desired cost and performance goals, programmers must contend with carefully designing algorithms for specific architectures.
Abstract: Some of the problems encountered in mapping a parallel algorithm are examined, emphasizing mappings of vision algorithms onto mesh, hypercube, mesh-of-trees, pyramid, and parallel random-access machines (PRAMs) having many simple processors, each with a small amount of memory. Approaches that have been suggested include simulating the ideal architectures, and using general data movement operations. Each of these is shown to occasionally produce unacceptably inefficient implementations. It appears that as long as PRAMs cannot achieve the desired cost and performance goals, programmers must contend with carefully designing algorithms for specific architectures. >

54 citations

Journal ArticleDOI
TL;DR: A technique is presented that can be used to obtain efficient parallel geometric algorithms in the EREW PRAM computational model to solve optimally a number of geometric problems in O(log n) time using O(n/log n), where n is the input size of a problem.
Abstract: We present a technique that can be used to obtain efficient parallel geometric algorithms in the EREW PRAM computational model. This technique enables us to solve optimally a number of geometric problems in O(log n) time using O(n/log n) EREW PRAM processors, where n is the input size of a problem. These problems include: computing the convex hull of a set of points in the plane that are given sorted, computing the convex hull of a simple polygon, computing the common intersection of half-planes whose slopes are given sorted, finding the kernel of a simple polygon, triangulating a set of points in the plane that are given sorted, triangulating monotone polygons and star-shaped polygons, and computing the all dominating neighbors of a sequence of values. PRAM algorithms for these problems were previously known to be optimal (i.e., in O(log n) time and using O(n/log n) processors) only on the CREW PRAM, which is a stronger model than the EREW PRAM. >

49 citations