The power of parallel prefix

doi:10.1109/TC.1985.6312202

Home
/
Papers
/
The power of parallel prefix

Journal Article•DOI•

The power of parallel prefix

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Carnegie Mellon University², Hebrew University of Jerusalem³

01 Oct 1985-IEEE Transactions on Computers (IEEE)-Vol. 34, Iss: 10, pp 965-968

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.

read less

Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Massively parallel data optimization

[...]

K. Knobe, J.D. Lukas, G.L. Steele

10 Oct 1988

TL;DR: It is shown that this compilation approach performs analysis of usage patterns and determines the allocation strategy for each occurrence of an array section, and the potential performance impact of this compilation technology is measured in orders of magnitude rather than percentages.

...read moreread less

Abstract: Techniques for the automatic layout of arrays in a Fortran compiler supporting Fortran 8* array features and targeted to the Connection Machine computer system are discussed. The goal is primarily to minimize the costs of moving data between processors and secondarily to minimize memory usage. Improved array layout may allow communications operations to be eliminated or to be replaced by more specialized communications operations with lower costs. The authors discuss a typical example of a code fragment that can be improved by a factor of 2 in memory consumption and a factor of 20 in speed. It is shown that this compilation approach performs analysis of usage patterns and determines the allocation strategy for each occurrence of an array section. The potential performance impact of this compilation technology is measured in orders of magnitude rather than percentages. >

...read moreread less

26 citations

Book Chapter•DOI•

Parallel Image Processing

[...]

Angela Y. Wu¹•Institutions (1)

American University¹

01 Jan 2001

TL;DR: This chapter describes parallel computers suitable for image processing tasks, including meshes, pyramids, and hypercubes, and discusses parallel algorithms for pixel-level and region-level processing.

...read moreread less

Abstract: This chapter reviews basic work on parallel image processing and analysis, with emphasis on work done at the Computer Vision Laboratory at the University of Maryland. It describes parallel computers suitable for image processing tasks, including meshes, pyramids, and hypercubes, and discusses parallel algorithms for pixel-level and region-level processing.

...read moreread less

25 citations

Proceedings Article•

A Framework for Performance Assessment of Parallel Bi-Directorial Heuristic Search.

[...]

Rehab Duwairi, Basel Mahafzah, Abdel Elah Al-Ayyoub

01 Jan 2002

TL;DR: This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry, and demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix.

...read moreread less

Abstract: This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over nite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry-exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered a number of results. Second, parallel algorithms for Fourier transforms for nite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are pro-

...read moreread less

25 citations

Journal Article•DOI•

Optimal parallel algorithms for finding cut vertices and bridges of interval graphs

[...]

Alan P. Sprague¹, K. H. Kulkarni²•Institutions (2)

University UCINF¹, Rust College²

19 Jun 1992-Information Processing Letters

TL;DR: An O(log n ) time algorithm in the EREW PRAM model, using n /log n processors, to find cut vertices, bridges, and blocks (often called biconnected components) of an interval graph having n vertices is presented.

...read moreread less

25 citations

Cites background from "The power of parallel prefix"

...Compress ing , or packing, a sparse array is a s tandard use of paral lel prefix [10]....
[...]

Journal Article•DOI•

Faster optimal parallel prefix circuits: New algorithmic construction

[...]

Yen-Chun Lin¹, Chin-Yu Su¹•Institutions (1)

National Taiwan University of Science and Technology¹

01 Dec 2005-Journal of Parallel and Distributed Computing

TL;DR: This paper presents a new algorithm to design parallel prefix circuits, and constructs a class of depth-size optimal parallel prefixcircuits, named SU4, with fan-out 4, which has the smallest depth among all known depth- size optimal prefix circuits withFan out 4.

...read moreread less

24 citations

Cites background or methods from "The power of parallel prefix"

...To accelerate the prefix operation, many parallel prefix algorithms for various parallel computing models have also been proposed [1,7,9,15,17,19,22,23,27,31,32,34–36]....
[...]
...Many have reported its important role in various applications, such as cryptography, binary addition, biological sequence comparison, design of silicon compilers, image processing, job scheduling, loop parallelization, polynomial evaluation, processor allocation, and sorting [1–3,8,10,11,17–19,21,40,42,44,45]....
[...]