The power of parallel prefix

doi:10.1109/TC.1985.6312202

Journal Article•DOI•

The power of parallel prefix

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Carnegie Mellon University², Hebrew University of Jerusalem³

01 Oct 1985-IEEE Transactions on Computers (IEEE)-Vol. 34, Iss: 10, pp 965-968

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.

read less

Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

New Parallel Prefix Algorithm for Multicomputers

[...]

Yen-Chun Lin¹, Chun-Yu Ko²•Institutions (2)

Chang Jung Christian University¹, National Taiwan University of Science and Technology²

26 May 2011

TL;DR: In this article, a parallel prefix algorithm for message-passing multicomputers is presented, which uses only half-duplex communications and provides the flexibility of choosing parameter values for either fewer computation time steps or fewer communication time steps to achieve the minimal running time based on the ratio of the time required by a communication step to the time of a computation step.

...read moreread less

Abstract: A new computation-efficient parallel prefix algorithm for message-passing multicomputers is presented. The algorithm uses only half-duplex communications. It provides the flexibility of choosing parameter values for either fewer computation time steps or fewer communication time steps to achieve the minimal running time based on the ratio of the time required by a communication step to the time required by a computation step. Thus, under certain conditions, the new algorithm can run faster than previous ones for the same multicomputer model.

...read moreread less

1 citations

Posted Content•

Uniform Linked Lists Contraction.

[...]

Yijie Han¹•Institutions (1)

University of Missouri–Kansas City¹

12 Feb 2020-arXiv: Data Structures and Algorithms

TL;DR: A parallel algorithm (EREW PRAM algorithm) is presented that when the authors contract a linked list from size n to size c for a suitable constant $c$ they can pack the linked list into an array of size n/d for a constant $1 < d\leq c$ in the time of 3 coloring the list.

...read moreread less

Abstract: We present a parallel algorithm (EREW PRAM algorithm) for linked lists contraction. We show that when we contract a linked list from size $n$ to size $n/c$ for a suitable constant $c$ we can pack the linked list into an array of size $n/d$ for a constant $1 < d\leq c$ in the time of 3 coloring the list. Thus for a set of linked lists with a total of $n$ elements and the longest list has $l$ elements our algorithm contracts them in $O(n\log i/p+(\log^{(i)}n+\log i )\log \log l+ \log l)$ time, for an arbitrary constructible integer $i$, with $p$ processors on the EREW PRAM, where $\log^{(1)} n =\log n$ and $\log^{(t)}n=\log \log^{(t-1)} n$ and $\log^*n=\min \{ i|\log^{(i)} n < 10\}$. When $i$ is a constant we get time $O(n/p+\log^{(i)}n\log \log l+\log l)$. Thus when $l=\Omega (\log^{(c)}n)$ for any constant $c$ we achieve $O(n/p+\log l)$ time. The previous best deterministic EREW PRAM algorithm has time $O(n/p+\log n)$ and best CRCW PRAM algorithm has time $O(n/p+\log n/\log \log n+\log l)$. Keywords: Parallel algorithms, linked list, linked list contraction, uniform linked list contraction, EREW PRAM.

...read moreread less

1 citations

Dissertation•DOI•

On Optimal Algorithms for List Ranking in the Parallel External Memory Model with Applications to Treewidth and other Elementary Graph Problems

[...]

Tobias Lieber

01 Jan 2014

TL;DR: By modeling how list ranking algorithms retrieve information on the structure of the list in the memory, a lower bound is given that is quadratic in sorting complexity for certain parameter settings, the first non-trivial lower bounds for list ranking for the bulk synchronous parallel and the MapReduce model.

...read moreread less

Abstract: The performance of many algorithms on large input instances substantially depends on the number of triggered cache misses instead of the number of executed operations. This behavior is captured by the external memory model in a natural way. It models a computer by a fast cache of bounded size and a conceptually infinite (external) memory. In contrast to the classical RAMmodel, the complexity measure is the number of cache lines transferred between the cache and the memory. Computations on elements in the cache are not counted. Recent trends in processor design and advances in big data computing require massively parallel algorithms. The parallel external memory (PEM) model extends the external memory model so that it also captures parallelism. It consists of multiple processors which each have a private cache and share the (external) memory. This thesis considers three computational problems in the context of (parallel) external memory algorithms. For the fundamental problem of list ranking, previously, an algorithm was known that has sorting complexity for many settings of the PEM model. In the first part of this thesis, this algorithm is complemented by matching lower bounds for most practical settings. Interestingly, a stronger lower bound for parameter ranges which previously have not been considered is shown. By modeling how list ranking algorithms retrieve information on the structure of the list in the memory, we give a lower bound that is quadratic in sorting complexity for certain parameter settings. It is noteworthy that this result implies the first non-trivial lower bounds for list ranking for the bulk synchronous parallel and the MapReduce model. These lower bounds are complemented by a list ranking algorithm which is, in contrast to previous algorithms, analyzed for all parameter settings of the PEM model. In the second part, an efficient algorithm for the PEM model to compute a tree decomposition of bounded width for a graph is presented. The main challenge is to implement a load balancing strategy such that the running

...read moreread less

1 citations

Fast Parallel Algorithms for Voronoi Diagrams

[...]

Micahel T. Goodrich, Colm O'Dunlaing, Chee Yap¹•Institutions (1)

Purdue University¹

01 Jan 1985

TL;DR: Two parallel algorithms for constructing the Voronoi diagram of a.

...read moreread less

Abstract: We present two parallel algorithms for constructing the Voronoi diagram of a. aet of n > 0 line segments in the plane: a) The first algorithm runs in 000g2 n) time using O(n) processors. This improves the previous best results (by A. Chow and also by Aggarwal, Chazelle, Guibas, 6'Dunlaing and Yap) in two respects. First we improve the running time by a factor of O(logn) and second the original results allow only aets ofpointa. b) By using O(n1+() processors. for any f. > 0, we improve the running time to 00ogn). This is the fastest known algorithm uaing a subquadratic number of processors. The results combine a number of techniques: a new O(logn) method for point location in certain tree-shaped Voronoi diagrams, a method of Aggarwal et al for reducing contour tracing to merging tree-shaped Voronoi diagrams, and a technique of Yap for computing the Voronoi diagrams of line segments. The computational model we use is the CREW PRAM (Concurrent-Read, Exclusive-Write Parallel RAM).

...read moreread less

1 citations

Cites methods from "The power of parallel prefix"

...Sort E 1 and E'}. in O(Iogn) time with O(n) processors along the y~direction using parallel prefix [10] (every edge can determine in 0(1) time its predecessor)....
[...]

Journal Article•DOI•

Parallel curve matching on the Connection Machine

[...]

Ling Tony Chen¹, Larry S. Davis¹•Institutions (1)

University of Maryland, College Park¹

01 Feb 1993-Pattern Recognition Letters

TL;DR: This paper considers the problem of matching image curves against a database of object curve models on massively parallel computers such as the Connection Machine by iteratively finding the longest common subcurve.

...read moreread less

1 citations

Collapse

The power of parallel prefix

Citations

Cites methods from "The power of parallel prefix"

Related Papers (5)