The power of parallel prefix

doi:10.1109/TC.1985.6312202

Journal Article•DOI•

The power of parallel prefix

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Carnegie Mellon University², Hebrew University of Jerusalem³

01 Oct 1985-IEEE Transactions on Computers (IEEE)-Vol. 34, Iss: 10, pp 965-968

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.

read less

Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A parallel algorithm to construct a dominance graph on nonoverlapping rectangles

[...]

Alan P. Sprague¹•Institutions (1)

University of Alabama at Birmingham¹

01 Aug 1992-International Journal of Parallel Programming

TL;DR: A parallel algorithm to generate the dominance graph on a collection of nonoverlapping iso-oriented rectangles is presented, which is the directed graph which contains an edge from a rectangleb to rectanglec iffc is immediately aboveb.

...read moreread less

Abstract: A parallel algorithm to generate the dominance graph on a collection of nonoverlapping iso-oriented rectangles is presented. This graph arises from the constraint graph commonly used in compaction algorithms for VLSI circuits. The dominance graph expresses the notion of “aboveness” on a collection of nonoverlapping rectangles: it is the directed graph which contains an edge from a rectangleb to rectanglec iffc is immediately aboveb. The algorithm is based on the divide and conquer paradigm; in the EREW PRAM model, it has time complexityO(log2n), usingn/logn processors. Its processor-time product isO(nlogn), which is optimal.

...read moreread less

5 citations

Cites background from "The power of parallel prefix"

...(14) Once this array is computed, O(n'/p') time suffices to move all marked members of Ty to Ty; in particular, each marked TyEi] is moved to T~ [partial_sum[i] ]....
[...]

Posted Content•

Work-stealing prefix scan: Addressing load imbalance in large-scale image registration.

[...]

Marcin Copik¹, Tobias Grosser², Torsten Hoefler¹, Paolo Bientinesi³, Benjamin Berkels⁴ - Show less +1 more•Institutions (4)

ETH Zurich¹, University of Edinburgh², Umeå University³, RWTH Aachen University⁴

23 Oct 2020-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: By identifying a suitable and well-optimized prefix scan algorithm, this article reduces time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over 10 hours to less than 3 minutes, enabling derivation of material properties at nanoscale for long microscopy image series.

...read moreread less

Abstract: Parallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this paper, we study the recursive registration of a series of electron microscopy images - a time consuming and imbalanced computation necessary for nano-scale microscopy analysis. We show that by translating the image registration into a specific instance of the prefix scan, we can convert this seemingly sequential problem into a parallel computation that scales to over thousand of cores. We analyze a variety of scan algorithms that behave similarly for common low-compute operators and propose a novel work-stealing procedure for a hierarchical prefix scan. Our evaluation shows that by identifying a suitable and well-optimized prefix scan algorithm, we reduce time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over 10 hours to less than 3 minutes (using 1024 Intel Haswell cores), enabling derivation of material properties at nanoscale for long microscopy image series.

...read moreread less

5 citations

Cites methods from "The power of parallel prefix"

...al [17] presented such algorithm on an EREW model [17]....
[...]

Journal Article•DOI•

An optimal parallel algorithm for planar cycle separators

[...]

Ming-Yang Kao¹, Shang-Hua Teng², Kentaro Toyama³•Institutions (3)

Duke University¹, Massachusetts Institute of Technology², Yale University³

01 Nov 1995-Algorithmica

TL;DR: An optimal parallel algorithm for computing a cycle separator of ann-vertex embedded planar undirected graph in O(logn) time on n/logn processors is presented and an improved parallel algorithm is obtained for constructing a depth-first search tree rooted at any given vertex in a connected planar Undirectedgraph.

...read moreread less

Abstract: We present an optimal parallel algorithm for computing a cycle separator of ann-vertex embedded planar undirected graph inO(logn) time onn/logn processors. As a consequence, we also obtain an improved parallel algorithm for constructing a depth-first search tree rooted at any given vertex in a connected planar undirected graph in O(log2n) time on n/logn processors. The best previous algorithms for computing depth-first search trees and cycle separators achieved the same time complexities, but withn processors. Our algorithms run on a parallel random access machine that permits concurrent reads and concurrent writes in its shared memory and allows an arbitrary processor to succeed in case of a write conflict.

...read moreread less

5 citations

Cites methods from "The power of parallel prefix"

...Step 5 uses optimal algorithms for prefix computation and list ranking [5], [10], [17], [28], [29]....
[...]

Proceedings Article•DOI•

Portable parallel algorithms for geometric problems

[...]

Russ Miller¹, Quentin F. Stout²•Institutions (2)

State University of New York System¹, University at Buffalo²

10 Oct 1988

TL;DR: The development of algorithms which can be ported among different fine-grain, massively parallel architectures and yield reasonably good implementations on each is discussed, and sample algorithms are given to solve some fundamental geometric problems.

...read moreread less

Abstract: The development of algorithms which can be ported among different fine-grain, massively parallel architectures and yield reasonably good implementations on each is discussed. The approach is to write algorithms in terms of general data movement operations and then implement the data movement operations on the target architecture. Efficient implementation of the data movement operations requires careful programming, but since the data movement operations form the foundation of many programs, the cost of implementing them can be amortized. The use of data movement operations also helps programmers think in terms of higher-level programming units, in the same way that the use of standard data structures helps programmers of serial computers. An approach is described for designing efficient, portable algorithms, and sample algorithms are given to solve some fundamental geometric problems. The difficulties of portability and efficiency for these geometric problems are redirected into similar difficulties for the standardization operations. >

...read moreread less

5 citations

Cites background from "The power of parallel prefix"

...Interested readers might consult [2, 3 , 5, 6, 7] for additional operations and extensive uses of the operations discussed here....
[...]
...More recently there have been attempts to promote specific data movement operations as a programming aid [2, 3 ], or to develop a collection of data movement operations particularly useful for a specific architecture [5]....
[...]

Journal Article•DOI•

Finding congruent regions in parallel

[...]

Laurence Boxer¹•Institutions (1)

Niagara University¹

01 Jul 1992

TL;DR: The algorithm for finding all polygons in G that are congruent to P requires Θ(n log n) time for a CREW PRAM with m processors, which improves upon the O(n2) time required by the systolic array algorithmm of [7].

...read moreread less

Abstract: Given a straight-line embedded plane graphh G of n edges and a polygon P of m edges, m≤n, we describe an algorithm for finding all polygons in G that are congruent to P. Our algorithm requires Θ(n log n) time for a CREW PRAM with m processors. This improves upon the O(n2) time (with m processors) required by the systolic array algorithmm of [7]. We also show the problem is in NC by showing how to implement our algorithm in Θ(log n) time using mn processors.

...read moreread less

5 citations

Collapse

The power of parallel prefix

Citations

Cites background from "The power of parallel prefix"

Cites methods from "The power of parallel prefix"

Cites methods from "The power of parallel prefix"

Cites background from "The power of parallel prefix"

Related Papers (5)