scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The power of parallel prefix

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.
Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).
Citations
More filters
Journal ArticleDOI
TL;DR: It is shown that U can be represented among then nodes of a variant of the mesh of trees usingO((m/n) polylog(m/ n) storage per node such that anyn-tuple of variables may be accessed inO(logn (log logn)2) time in the worst case form polynomial inn.
Abstract: The problem of representing a setU≜{u 1,...,u m} of read-write variables on ann-node distributed-memory parallel computer is considered. It is shown thatU can be represented among then nodes of a variant of the mesh of trees usingO((m/n) polylog(m/n)) storage per node such that anyn-tuple of variables may be accessed inO(logn (log logn)2) time in the worst case form polynomial inn.

6 citations

Journal Article
TL;DR: Relative merits and drawbacks of parallel prefix algorithms are described and illustrated to provide insights into when and why the presented algorithms can be best used.
Abstract: New families of computation-efficient parallel prefix algorithms for message-passing multicomputers are presented. The first family improves the communication time of a previous family of parallel prefix algorithms; both use only half-duplex communications. Two other families adopt collective communication operations to reduce the communication times of the former two, respectively. The precondition of the presented algorithms is also given. These families each provide the flexibility of either fewer computation time steps or fewer communication time steps to achieve the minimal running time depending on the ratio of the time required by a communication step to the time required by a computation step. Relative merits and drawbacks of parallel prefix algorithms are described and illustrated to provide insights into when and why the presented algorithms can be best used.

6 citations


Cites background from "The power of parallel prefix"

  • ...Relative merits and drawbacks of parallel prefix algorithms are described and illustrated to provide insights into when and why the presented algorithms can be best used....

    [...]

Journal Article
TL;DR: A family of computation-efficient parallel prefix algorithms for message-passing multicomputers that provide the flexibility of choosing either less computation time or less communication time, depending on the characteristics of the target machine, to achieve the minimal running time.
Abstract: A family of computation-efficient parallel prefix algorithms for message-passing multicomputers are presented. The family generalizes a previous algorithm that uses only half-duplex communications, and thus can improve the running time. Several properties of the family are derived, including the number of computation steps, the number of communication steps, and the condition for effective use of the family. The family can adopt collective communication operations to reduce the communication time, and thus becomes a second family of algorithms. These algorithms provide the flexibility of choosing either less computation time or less communication time, depending on the characteristics of the target machine, to achieve the minimal running time.

6 citations


Additional excerpts

  • ...Key-Words: - Computation-efficient, Cost optimality, Half-duplex, Message-passing multicomputers, Parallel algorithms, Prefix computation...

    [...]

Journal ArticleDOI
TL;DR: This paper presents an efficient parallel algorithm for finding approximate solutions to the 0–1 knapsack problem that takes an e, 0 < e < 1, as a parameter and computes a solution such that the ratio of its deviation from the optimal solution is at most a fraction e of the ideal solution.
Abstract: Computing an optimal solution to the knapsack problem is known to be NP-hard. Consequently, fast parallel algorithms for finding such a solution without using an exponential number of processors appear unlikely. An attractive alternative is to compute an approximate solution to this problem rapidly using a polynomial number of processors. In this paper, we present an efficient parallel algorithm for finding approximate solutions to the 0–1 knapsack problem. Our algorithm takes an e, 0 < e < 1, as a parameter and computes a solution such that the ratio of its deviation from the optimal solution is at most a fraction e of the optimal solution. For a problem instance having n items, this computation uses O(n52/e32) processors and requires O(log3n + log2nlog(1e)) time. The upper bound on the processor requirement of our algorithm is established by reducing it to a problem on weighted bipartite graphs. This processor complexity is a significant improvement over that of other known parallel algorithms for this problem.

6 citations

Book ChapterDOI
Lin Chen1
01 Feb 1991
TL;DR: Several fastest deterministic algorithms including an optimal algorithm which sorts n distinct integers in O( log n) time using O(n/log n) processors on EREW PRAM for the case where the integers are in a range linear in n.
Abstract: The main result of this paper is several fastest deterministic algorithms including: •an optimal algorithm which sorts n distinct integers in O(log n) time using O(n/log n) processors on EREW PRAM for the case where the integers are in a range linear in n; •an optimal algorithm which sorts n integers in O(log n/log log n) time using O(n log log n/log n) processors on CRCW PRAM for the case where the integers are in a range linear in n and a constant upper bounded number of integers have a constant lower bounded multiplicity.

6 citations


Cites methods from "The power of parallel prefix"

  • ...[ 15 ] gave an algorithm which runs in O(log n) time with O(n/log n) processors on EREW PRAM....

    [...]