scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The power of parallel prefix

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.
Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).
Citations
More filters
Journal ArticleDOI
TL;DR: A parallel adaptive-grid Navier-Stokes algorithm based on generic primitives has been developed, which allowed relatively simple implementation of the algorithm on two different parallel systems; an eight-processor Cray Y-MP and the Connection Machine CM-2.
Abstract: A parallel adaptive-grid Navier-Stokes algorithm based on generic primitives has been developed. The parallel primitives are general for the class of explicit finite-volume Navier-Stokes numerical schemes. Furthermore, they allowed relatively simple implementation of the algorithm on two different parallel systems; an eight-processor Cray Y-MP and the Connection Machine CM-2. A novel data structure for the adaptive grid allowed efficient parallel refinement/coarsening of the mesh. Substantial speeds compared to the corresponding sequential algorithm were realized on both systems. I. Introduction C OMPUTATIONAL fluid dynamics (CFD) has advanced rapidly over the last two decades, and it is recognized as a valuable tool for engineering design. However, numerical simulation of viscous flowfields remains very expensive even with use of current vector computers. Advances in numerical algorithms are not expected to reduce the cost of those computations to the extent that they can routinely be applied for design. Vector computers accelerated computations by one or two orders of magnitude compared to scalar machines, which is not sufficient for efficient large-scale flow simulations. Another approach to computer architectures has been employment of a number of processors that work in parallel executing the same job. Parallel computing appears to be a promising approach for future design applications of CFD. Development of CFD applications on state-of-the-art parallel machines currently requires considerable effort on the part of the user to understand the intricacies of the underlying architecture and fine tune the application to match them. Most of the effort has to be duplicated when the same application has to be ported to another machine. All of this inefficiency can be eliminated to a large extent by allowing the user to design the application in a machine-independent fashion using general primitives. Several parallel algorithms have been developed in the past. The architectures that have been employed include the Cray Y-MP/8,1'2 as well as the Connection Machine CM-2.3"5 Many of the applications to date have dealt with structured meshes, in which data is stored in a regular manner. Unstructured grid solvers have also been developed.4'6'7 A large number of the parallel CFD codes have been developed for a specific architecture, and portability of the algorithms has not been considered. Adaptive algorithms have become quite popular in CFD. They provide flexibility to adjust the grid during the solution procedure without intervention by the user.8"10 Those adaptive grid algorithms have been developed for sequential execution and have reached a level of maturity. However, the area of parallel adaptive algorithms is relatively unexplored.1 The present work develops a parallel adaptive algorithm that employs generic primitives. Those parallel primitives are

18 citations

Proceedings ArticleDOI
Yijie Han1
01 Mar 1989
TL;DR: The curve O( n log i p + log n + log i) is shown for the time complexity of computing a maximal matching for a linked list, where n is the size of the input list, p is the number of processors used in the algorithm and i is an adjustable parameter.
Abstract: We show the curve O( n log i p + log n + log i) for the time complexity of computing a maximal matching for a linked list, where n is the size of the input list, p is the number of processors used in the algorithm and i is an adjustable parameter. For all constructible i the time complexity represented by the curve can be realized. Our algorithm is optimal using up to O( n log n ) processors with an arbitrarily large constant i. This algorithm can be used to compute a maximal independent set or a 3 coloring for a linked list.

18 citations

Journal ArticleDOI
TL;DR: A data parallel volume rendering algorithm with numerous advantages over prior published solutions, derived a new processor permutation assignment of five passes, and a new parallel compositing technique that is essential for scaling linearly on machines that have more processors than view rays to process (P

17 citations


Cites methods from "The power of parallel prefix"

  • ...A parallel product evaluation [16], is work efficient up to P = O(n/P + log P) processors in the view depth dimension, Figs....

    [...]

  • ...Binary tree combining computes products for any associative (not necessarily commutative) op rator⊗, I1⊗ I2⊗ · · ·⊗ IW [16, 25]....

    [...]

Book ChapterDOI
25 Aug 1994
TL;DR: This paper presents fast and efficient parallel algorithms for implementing operations on the PPQ's that maintain data items with real-valued keys that have considerably smaller time and/or work bounds than the previously best known algorithms.
Abstract: The Parallel Priority Queue (PPQ) data structure supports parallel operations for manipulating data items with keys, such as inserting n new items, deleting n items with the smallest keys, creating a new PPQ that contains a set of items, and melding two PPQ's into one In this paper, we present fast and efficient parallel algorithms for implementing operations on the PPQ's that maintain data items with real-valued keys The data structures that we use for implementing the PPQ's are the unmeldable and meldable parallel heaps Our algorithms have considerably smaller time and/or work bounds than the previously best known algorithms, and use a less powerful parallel computational model (the EREW PRAM) The new ideas that make our improvement possible are two partition schemes dynamically maintained on the parallel heap structures: the minimal-path partition and the right-path partition These partition schemes could be of interest in their own right Our results also lead to optimal parallel algorithms for implementing sequential operations on several traditional heap structures

17 citations

Journal ArticleDOI
TL;DR: A linear time algorithm for finding all hinge vertices of a permutation graph, which can be used to identify critical nodes in a real network.

17 citations