scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The power of parallel prefix

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.
Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).
Citations
More filters
Journal ArticleDOI
TL;DR: A parallel algorithm for random walk generation in regular as well as irregular regions is presented and is shown to ideally fit on a hypercube of n nodes, where n is the number of processors.
Abstract: Random walks are widely applicable in statistical and scientific computations. In particular, they are used in the Monte Carlo method to solve elliptic and parabolic partial differential equations (PDEs). This method holds several advantages over other methods for PDEs as it solves problems with irregular boundaries and/or discontinuities, gives solutions at individual points, and exhibits great parallelism. However, the generation of each random walk in the Monte Carlo method has been done sequentially because each point in the walk is derived from the preceding point by moving one grid step along a randomly selected direction. A parallel algorithm for random walk generation in regular as well as irregular regions is presented. The algorithm is based on parallel prefix computations. The communication structure of the algorithm is shown to ideally fit on a hypercube of n nodes, where n is the number of processors. >

4 citations

Journal ArticleDOI
TL;DR: A routing strategy to ensure the edge-disjointness of the routing paths in executing binary tree algorithms is identified and the fault tolerance of the embedding method is discussed.

4 citations

Proceedings ArticleDOI
16 May 2011
TL;DR: This paper presents a shared-memory programming framework that allows tasks to dynamically spawn subtasks with a given degree of parallelism for implementing tightly coupled parallel parts of the algorithm, and presents a new algorithm for work-stealing with deterministic team-building.
Abstract: Parallelizing complex applications even for well-behaved parallel systems often calls for different parallelization approaches within the same application. In this paper we discuss three applications from the literature that for both reasons of efficiency and expressive convenience benefit from a mixture of task and more tightly coupled data parallelism. These three applications, namely Quick sort, list ranking, and LU factorization with partial pivoting, are paradigms for recursive, mixed-mode parallel algorithms that can neither easily nor efficiently be expressed in either a purely data-parallel or a purely task-parallel fashion. As a solution we present a shared-memory programming framework that allows tasks to dynamically spawn subtasks with a given degree of parallelism for implementing tightly coupled parallel parts of the algorithm. All three paradigmatic applications can naturally be expressed in this framework, which in turn can be supported by an extended, non-conventional work-stealing scheduler, which we also briefly sketch. Using our new algorithm for work-stealing with deterministic team-building we are able to show, beyond the improved, more natural implementability, in many cases better scalability and sometimes absolute performance than with less natural implementations based on pure task-parallelism executed with conventional work-stealing. Detailed performance results using an Intel 32-core system substantiate our claims.

4 citations


Cites background from "The power of parallel prefix"

  • ...The operation of splicing out of a list element is called pair_off in [19]....

    [...]

Proceedings ArticleDOI
16 Aug 1993
TL;DR: It is shown that articulation points and bridges of permutation graphs can be found in O( logn) time using O(n/logn) processors on an EREW PRAM.
Abstract: We show that articulation points and bridges of permutation graphs can be found in O(logn) time using O(n/logn) processors on an EREW PRAM. The algorithms are optimal with respect to the time-processor product.

4 citations

Journal ArticleDOI
TL;DR: In this article, a hierarchical prefix scan algorithm was proposed to reduce the time of registration of a series of electron microscopy images to less than 3 minutes by translating the image registration into a specific instance of the prefix scan.
Abstract: Parallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this article, we study the recursive registration of a series of electron microscopy images – a time consuming and imbalanced computation necessary for nano-scale microscopy analysis. We show that by translating the image registration into a specific instance of the prefix scan, we can convert this seemingly sequential problem into a parallel computation that scales to over thousand of cores. We analyze a variety of scan algorithms that behave similarly for common low-compute operators and propose a novel work-stealing procedure for a hierarchical prefix scan. Our evaluation shows that by identifying a suitable and well-optimized prefix scan algorithm, we reduce time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over 10 hours to less than 3 minutes (using 1024 Intel Haswell cores), enabling derivation of material properties at nanoscale for long microscopy image series.

4 citations