scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The power of parallel prefix

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.
Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).
Citations
More filters
Journal ArticleDOI
TL;DR: This paper presents efficient sequential and parallel algorithms for computation of time-slot assignments in SS/TDMA (satellite-switched/time-division multiple-access) systems with variable-bandwidth beams and describes an efficient implementation of the algorithm on a hypercube multiprocessor with P processors.
Abstract: In this paper, we present efficient sequential and parallel algorithms for computation of time-slot assignments in SS/TDMA (satellite-switched/time-division multiple-access) systems with variable-bandwidth beams. These algorithms are based on modeling the time-slot assignment (TSA) problem as a network-flow problem. Our sequential algorithm, in general, has a better time-complexity than a previous algorithm due to Gopal, et al. (1982) and generates fewer switching matrices. If M (N) is the number of uplink (downlink) beams, L is the length of any optimal TSA, and /spl alpha/ is the maximum bandwidth of an uplink or downlink beam, our sequential algorithm takes O((M+N)/sup 3/min(MN/spl alpha/,L)) time to compute an optimal TSA when the traffic-handling capacity of the satellite is of the same order as the total bandwidth of the links. Our parallel algorithm uses L/2 processors and has a time-complexity of O((M+N)/sup 3/logL) on a PRAM model of parallel computation. We then generalize this algorithm to P/spl les/L/2 processors and describe an efficient implementation of the algorithm on a hypercube multiprocessor with P processors. A massively-parallel version of the algorithm runs in O((M+N)/sup 2/log(M+N)logL) time on (M+N)L/2 processors. >

9 citations

Journal ArticleDOI
01 Sep 1991
TL;DR: A conceptual graph structure called the detection network that yields efficient algorithms for the detection problem for combinational, message-based, sequential and parallel computing systems is proposed and the problem of computing a detection network with the minimum number of edges is shown to be computationally intractable.
Abstract: Algorithmic issues of simple object detection problems in the context of a system consisting of a finite set of sensors that monitor a workspace are studied. Each sensor detects the presence of only a certain subset of a given set of objects O. Given that an object has been detected by a subset of sensors, the detection problem deals with identifying whether the object in the workspace is a member of O and, if so, computing the maximal set of such numbers. A conceptual graph structure called the detection network that yields efficient algorithms for the detection problem for combinational, message-based, sequential and parallel computing systems is proposed. The problem of computing a detection network with the minimum number of edges is shown to be computationally intractable, and polynomial-time approximation algorithms are presented. Sequential algorithms to solve the detection problem with and without preprocessing are presented. Parallel algorithms on shared memory systems and hypercube-based message passing systems are discussed. It is shown that the problem of detecting multiple objects is computationally intractable. >

9 citations

Journal ArticleDOI
Yijie Han1
TL;DR: The author presents a deterministic parallel algorithm that computes linked list prefixs for an input list of n elements in time O(n/p+logn) on a local memory PRAM model using p processors and p shared memory cells.
Abstract: The author presents a deterministic parallel algorithm for the linked list prefix problem. It computes linked list prefixs for an input list of n elements in time O(n/p+logn) on a local memory PRAM model using p processors and p shared memory cells. >

9 citations

Journal ArticleDOI
TL;DR: This paper presents parallel algorithms for computing maximum cardinality matchings among pairs of disjoint intervals in interval graphs in the EREW PRAM and hypercube models and presents an improved parallel algorithm for maximum matching between overlapping intervals in proper interval graphs.
Abstract: Given a set of n intervals representing an interval graph, the problem of finding a maximum matching between pairs of disjoint (nonintersecting) intervals has been considered in the sequential model. In this paper we present parallel algorithms for computing maximum cardinality matchings among pairs of disjoint intervals in interval graphs in the EREW PRAM and hypercube models. For the general case of the problem, our algorithms compute a maximum matching in O( log 3 n) time using O(n/ log2 n) processors on the EREW PRAM and using n processors on the hypercubes. For the case of proper interval graphs, our algorithm runs in O( log n ) time using O(n) processors if the input intervals are not given already sorted and using O(n/ log n ) processors otherwise, on the EREW PRAM. On n -processor hypercubes, our algorithm for the proper interval case takes O( log n log log n ) time for unsorted input and O( log n ) time for sorted input. Our parallel results also lead to optimal sequential algorithms for computing maximum matchings among disjoint intervals. In addition, we present an improved parallel algorithm for maximum matching between overlapping intervals in proper interval graphs.

9 citations


Cites background or methods from "The power of parallel prefix"

  • ...Relabeling can be done by parallel prefix [22], [23] in O(log n) time and O(n/ log n) processors....

    [...]

  • ...The parallel prefix operation can be performed in O(log n) time using O(n/ log n) processors on the EREW PRAM [22], [23] and in O(log n) time on n-processor hypercubes [24]....

    [...]

Journal ArticleDOI
TL;DR: It is proved that the undecidability of both associativity and commutativity and the stronger result that the resulting relations fail to be recursively enumerable hold for the kind of function subprograms of practical interest in such a situation.
Abstract: Associativity is required for the use of general scans and reductions in parallel languages. Some systems also require functions used with scans and reductions to be commutative. We prove the undecidability of both associativity and commutativity. Thus, it is impossible in general for a compiler to check for those conditions. We also prove the stronger result that the resulting relations fail to be recursively enumerable. We prove that these results hold for the kind of function subprograms of practical interest in such a situation: function subprograms that, due to syntactical restrictions, are guaranteed to halt. Thus, our results are stronger than one can obtain from Rice's Theorem. We also obtain limitations concerning the construction of functions and limitations concerning compiler-generated run-time checks. In addition, we prove an undecidability result about programmer-constructed run-time checks.

9 citations