The power of parallel prefix

doi:10.1109/TC.1985.6312202

Journal Article•DOI•

The power of parallel prefix

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Carnegie Mellon University², Hebrew University of Jerusalem³

01 Oct 1985-IEEE Transactions on Computers (IEEE)-Vol. 34, Iss: 10, pp 965-968

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.

read less

Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Generic parallel adaptive-grid Navier-Stokes algorithm

[...]

Yannis Kallinderis¹, A. Vidwans¹•Institutions (1)

University of Texas at Austin¹

01 Jan 1994-AIAA Journal

TL;DR: A parallel adaptive-grid Navier-Stokes algorithm based on generic primitives has been developed, which allowed relatively simple implementation of the algorithm on two different parallel systems; an eight-processor Cray Y-MP and the Connection Machine CM-2.

...read moreread less

Abstract: A parallel adaptive-grid Navier-Stokes algorithm based on generic primitives has been developed. The parallel primitives are general for the class of explicit finite-volume Navier-Stokes numerical schemes. Furthermore, they allowed relatively simple implementation of the algorithm on two different parallel systems; an eight-processor Cray Y-MP and the Connection Machine CM-2. A novel data structure for the adaptive grid allowed efficient parallel refinement/coarsening of the mesh. Substantial speeds compared to the corresponding sequential algorithm were realized on both systems. I. Introduction C OMPUTATIONAL fluid dynamics (CFD) has advanced rapidly over the last two decades, and it is recognized as a valuable tool for engineering design. However, numerical simulation of viscous flowfields remains very expensive even with use of current vector computers. Advances in numerical algorithms are not expected to reduce the cost of those computations to the extent that they can routinely be applied for design. Vector computers accelerated computations by one or two orders of magnitude compared to scalar machines, which is not sufficient for efficient large-scale flow simulations. Another approach to computer architectures has been employment of a number of processors that work in parallel executing the same job. Parallel computing appears to be a promising approach for future design applications of CFD. Development of CFD applications on state-of-the-art parallel machines currently requires considerable effort on the part of the user to understand the intricacies of the underlying architecture and fine tune the application to match them. Most of the effort has to be duplicated when the same application has to be ported to another machine. All of this inefficiency can be eliminated to a large extent by allowing the user to design the application in a machine-independent fashion using general primitives. Several parallel algorithms have been developed in the past. The architectures that have been employed include the Cray Y-MP/8,1'2 as well as the Connection Machine CM-2.3"5 Many of the applications to date have dealt with structured meshes, in which data is stored in a regular manner. Unstructured grid solvers have also been developed.4'6'7 A large number of the parallel CFD codes have been developed for a specific architecture, and portability of the algorithms has not been considered. Adaptive algorithms have become quite popular in CFD. They provide flexibility to adjust the grid during the solution procedure without intervention by the user.8"10 Those adaptive grid algorithms have been developed for sequential execution and have reached a level of maturity. However, the area of parallel adaptive algorithms is relatively unexplored.1 The present work develops a parallel adaptive algorithm that employs generic primitives. Those parallel primitives are

...read moreread less

18 citations

Proceedings Article•DOI•

Matching partition a linked list and its optimization

[...]

Yijie Han¹•Institutions (1)

University of Kentucky¹

01 Mar 1989

TL;DR: The curve O( n log i p + log n + log i) is shown for the time complexity of computing a maximal matching for a linked list, where n is the size of the input list, p is the number of processors used in the algorithm and i is an adjustable parameter.

...read moreread less

Abstract: We show the curve O( n log i p + log n + log i) for the time complexity of computing a maximal matching for a linked list, where n is the size of the input list, p is the number of processors used in the algorithm and i is an adjustable parameter. For all constructible i the time complexity represented by the curve can be realized. Our algorithm is optimal using up to O( n log n ) processors with an arbitrarily large constant i. This algorithm can be used to compute a maximal independent set or a 3 coloring for a linked list.

...read moreread less

18 citations

Collapse

The power of parallel prefix

Citations

Cites methods from "The power of parallel prefix"

Related Papers (5)