The power of parallel prefix
TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.
Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).
Citations
More filters
••
10 Oct 1988
TL;DR: It is shown that this compilation approach performs analysis of usage patterns and determines the allocation strategy for each occurrence of an array section, and the potential performance impact of this compilation technology is measured in orders of magnitude rather than percentages.
Abstract: Techniques for the automatic layout of arrays in a Fortran compiler supporting Fortran 8* array features and targeted to the Connection Machine computer system are discussed. The goal is primarily to minimize the costs of moving data between processors and secondarily to minimize memory usage. Improved array layout may allow communications operations to be eliminated or to be replaced by more specialized communications operations with lower costs. The authors discuss a typical example of a code fragment that can be improved by a factor of 2 in memory consumption and a factor of 20 in speed. It is shown that this compilation approach performs analysis of usage patterns and determines the allocation strategy for each occurrence of an array section. The potential performance impact of this compilation technology is measured in orders of magnitude rather than percentages. >
26 citations
••
01 Jan 2001TL;DR: This chapter describes parallel computers suitable for image processing tasks, including meshes, pyramids, and hypercubes, and discusses parallel algorithms for pixel-level and region-level processing.
Abstract: This chapter reviews basic work on parallel image processing and analysis, with emphasis on work done at the Computer Vision Laboratory at the University of Maryland. It describes parallel computers suitable for image processing tasks, including meshes, pyramids, and hypercubes, and discusses parallel algorithms for pixel-level and region-level processing.
25 citations
•
01 Jan 2002
TL;DR: This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry, and demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix.
Abstract: This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over nite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry-exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered a number of results. Second, parallel algorithms for Fourier transforms for nite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are pro-
25 citations
••
TL;DR: An O(log n ) time algorithm in the EREW PRAM model, using n /log n processors, to find cut vertices, bridges, and blocks (often called biconnected components) of an interval graph having n vertices is presented.
25 citations
Cites background from "The power of parallel prefix"
...Compress ing , or packing, a sparse array is a s tandard use of paral lel prefix [10]....
[...]
••
TL;DR: This paper presents a new algorithm to design parallel prefix circuits, and constructs a class of depth-size optimal parallel prefixcircuits, named SU4, with fan-out 4, which has the smallest depth among all known depth- size optimal prefix circuits withFan out 4.
24 citations
Cites background or methods from "The power of parallel prefix"
...To accelerate the prefix operation, many parallel prefix algorithms for various parallel computing models have also been proposed [1,7,9,15,17,19,22,23,27,31,32,34–36]....
[...]
...Many have reported its important role in various applications, such as cryptography, binary addition, biological sequence comparison, design of silicon compilers, image processing, job scheduling, loop parallelization, polynomial evaluation, processor allocation, and sorting [1–3,8,10,11,17–19,21,40,42,44,45]....
[...]