The power of parallel prefix

doi:10.1109/TC.1985.6312202

Journal Article•DOI•

The power of parallel prefix

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Carnegie Mellon University², Hebrew University of Jerusalem³

01 Oct 1985-IEEE Transactions on Computers (IEEE)-Vol. 34, Iss: 10, pp 965-968

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.

read less

Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

An efficient and fast parallel-connected component algorithm

[...]

Yujie Han¹, Robert A. Wagner²•Institutions (2)

University of Kentucky¹, Duke University²

01 Jul 1990-Journal of the ACM

TL;DR: Results show that linear speedup can be obtained for up to up to p ≤ e/p/log e/log 2-supscrpt/log n-n processors when graphs satisfying e/e ≥ n, and for graphs satisfying ≥ n-log log, if a more efficient integer sorting algorithm is available.

...read moreread less

Abstract: A parallel algorithm for computing the connected components of undirected graphs is presented. Shared memory computation models are assumed. For a graph of e edges and n nodes, the time complexity of the algorithm is O(e/p + (n log n)/p + log2n) with p processors. The algorithm can be further refined to yield time complexity O(H(e, n, p)/p + (n log n)/(p log(n/p)) + log2n), where H(e, n, p) is very close to O(e). These results show that linear speedup can be obtained for up to p ≤ e/log2n processors when e ≥ n log n. Linear speedup can still be achieved with up to p ≤ ne processors, 0 ≤ e

...read moreread less

70 citations

Journal Article•DOI•

Towards optimal parallel bucket sorting

[...]

Torben Hagerup

01 Oct 1987-Information & Computation

TL;DR: A simple deterministic parallel algorithm that runs on a CRCW PRAM and sorts n integers of size polynomial in n in time O(log n) using O(n log log n Log n) processors is presented.

...read moreread less

Abstract: We present a simple deterministic parallel algorithm that runs on a CRCW PRAM and sorts n integers of size polynomial in n in time O(log n) using O(n log log nlog n) processors. It is closer to optimality than any previously known deterministic algorithm that solves the stated restricted sorting problem in polylog time.

...read moreread less

65 citations

Cites methods from "The power of parallel prefix"

...…difficult to design such algorithms for m = (log n)O(‘) (Cole and Vishkin, 1986a; Reif, 1985) and, in general, optimal algorithms with a running time of O(m’+ log n) for fixed E > 0 (Kruskal et al., 1985), the case *This work was supported by the Deutsche Forschungsgemeinschaft, SFB 124, TP B 2....
[...]

Proceedings Article•DOI•

Compiling Fortran 8x array features for the connection machine computer system

[...]

Eugene Albert, Kathleen Knobe, Joan D. Lukas¹, Guy L. Steele•Institutions (1)

University of Massachusetts Amherst¹

01 Jan 1988

TL;DR: The Connection Machine® computer system supports a data parallel programming style, making it a natural target architecture for Fortran 8x array constructs, and the CM Fortran compiler minimizes data motion for aligned array operations, minimizes transfers between the Connection Machine and the VAX and minimizes context switching for masked computations.

...read moreread less

Abstract: The Connection Machine® computer system supports a data parallel programming style, making it a natural target architecture for Fortran 8x array constructs. The Connection Machine Fortran compiler generates VAX code that performs scalar operations and directs the Connection Machine to perform array operations. The Connection Machine virtual processor mechanism supports elemental operations on very large arrays. Most array operators and intrinsic functions map into single instructions or short instruction sequences. Noncontiguous array sections, array-valued subscripts, and parallel constructs such as WHERE and FORALL are also readily accommodated on the Connection Machine. In addition to such customary optimizations as common subexpression elimination, the CM Fortran compiler minimizes data motion for aligning array operations, minimizes transfers between the Connection Machine and the VAX and minimizes context switching for masked computations.

...read moreread less

61 citations

Cites background from "The power of parallel prefix"

...as add, rtax, or logior) and performs a parallel prefir computation [ 12 ,10]....
[...]

Journal Article•DOI•

Mapping vision algorithms to parallel architectures

[...]

Quentin F. Stout¹•Institutions (1)

University of Michigan¹

01 Jan 1988

TL;DR: It appears that as long as PRAMs cannot achieve the desired cost and performance goals, programmers must contend with carefully designing algorithms for specific architectures.

...read moreread less

Abstract: Some of the problems encountered in mapping a parallel algorithm are examined, emphasizing mappings of vision algorithms onto mesh, hypercube, mesh-of-trees, pyramid, and parallel random-access machines (PRAMs) having many simple processors, each with a small amount of memory. Approaches that have been suggested include simulating the ideal architectures, and using general data movement operations. Each of these is shown to occasionally produce unacceptably inefficient implementations. It appears that as long as PRAMs cannot achieve the desired cost and performance goals, programmers must contend with carefully designing algorithms for specific architectures. >

...read moreread less

54 citations

Journal Article•DOI•

Efficient geometric algorithms on the EREW PRAM

[...]

Danny Z. Chen¹•Institutions (1)

University of Notre Dame¹

01 Jan 1995-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A technique is presented that can be used to obtain efficient parallel geometric algorithms in the EREW PRAM computational model to solve optimally a number of geometric problems in O(log n) time using O(n/log n), where n is the input size of a problem.

...read moreread less

Abstract: We present a technique that can be used to obtain efficient parallel geometric algorithms in the EREW PRAM computational model. This technique enables us to solve optimally a number of geometric problems in O(log n) time using O(n/log n) EREW PRAM processors, where n is the input size of a problem. These problems include: computing the convex hull of a set of points in the plane that are given sorted, computing the convex hull of a simple polygon, computing the common intersection of half-planes whose slopes are given sorted, finding the kernel of a simple polygon, triangulating a set of points in the plane that are given sorted, triangulating monotone polygons and star-shaped polygons, and computing the all dominating neighbors of a sequence of values. PRAM algorithms for these problems were previously known to be optimal (i.e., in O(log n) time and using O(n/log n) processors) only on the CREW PRAM, which is a stronger model than the EREW PRAM. >

...read moreread less

49 citations

Collapse

The power of parallel prefix

Citations

Cites methods from "The power of parallel prefix"

Cites background from "The power of parallel prefix"

Related Papers (5)