The power of parallel prefix

doi:10.1109/TC.1985.6312202

Journal Article•DOI•

The power of parallel prefix

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Carnegie Mellon University², Hebrew University of Jerusalem³

01 Oct 1985-IEEE Transactions on Computers (IEEE)-Vol. 34, Iss: 10, pp 965-968

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.

read less

Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Representing shared data on distributed-memory parallel computers

[...]

Kieran T. Herley¹•Institutions (1)

Cornell University¹

01 Apr 1996-Theory of Computing Systems \/ Mathematical Systems Theory

TL;DR: It is shown that U can be represented among then nodes of a variant of the mesh of trees usingO((m/n) polylog(m/ n) storage per node such that anyn-tuple of variables may be accessed inO(logn (log logn)2) time in the worst case form polynomial inn.

...read moreread less

Abstract: The problem of representing a setU≜{u 1,...,u m} of read-write variables on ann-node distributed-memory parallel computer is considered. It is shown thatU can be represented among then nodes of a variant of the mesh of trees usingO((m/n) polylog(m/n)) storage per node such that anyn-tuple of variables may be accessed inO(logn (log logn)2) time in the worst case form polynomial inn.

...read moreread less

6 citations

Journal Article•

New families of computation-efficient parallel prefix algorithms

[...]

Yen-Chun Lin¹, Li-Ling Hung²•Institutions (2)

National Taiwan University of Science and Technology¹, Aletheia University²

01 Oct 2009-WSEAS Transactions on Computers archive

TL;DR: Relative merits and drawbacks of parallel prefix algorithms are described and illustrated to provide insights into when and why the presented algorithms can be best used.

...read moreread less

Abstract: New families of computation-efficient parallel prefix algorithms for message-passing multicomputers are presented. The first family improves the communication time of a previous family of parallel prefix algorithms; both use only half-duplex communications. Two other families adopt collective communication operations to reduce the communication times of the former two, respectively. The precondition of the presented algorithms is also given. These families each provide the flexibility of either fewer computation time steps or fewer communication time steps to achieve the minimal running time depending on the ratio of the time required by a communication step to the time required by a computation step. Relative merits and drawbacks of parallel prefix algorithms are described and illustrated to provide insights into when and why the presented algorithms can be best used.

...read moreread less

6 citations

Cites background from "The power of parallel prefix"

...Relative merits and drawbacks of parallel prefix algorithms are described and illustrated to provide insights into when and why the presented algorithms can be best used....
[...]

Journal Article•

Parallel prefix algorithms on the multicomputer

[...]

Li-Ling Hung¹, Yen-Chun Lin¹•Institutions (1)

National Taiwan University of Science and Technology¹

01 Apr 2008-WSEAS Transactions on Computers archive

TL;DR: A family of computation-efficient parallel prefix algorithms for message-passing multicomputers that provide the flexibility of choosing either less computation time or less communication time, depending on the characteristics of the target machine, to achieve the minimal running time.

...read moreread less

Abstract: A family of computation-efficient parallel prefix algorithms for message-passing multicomputers are presented. The family generalizes a previous algorithm that uses only half-duplex communications, and thus can improve the running time. Several properties of the family are derived, including the number of computation steps, the number of communication steps, and the condition for effective use of the family. The family can adopt collective communication operations to reduce the communication time, and thus becomes a second family of algorithms. These algorithms provide the flexibility of choosing either less computation time or less communication time, depending on the characteristics of the target machine, to achieve the minimal running time.

...read moreread less

6 citations

Additional excerpts

...Key-Words: - Computation-efficient, Cost optimality, Half-duplex, Message-passing multicomputers, Parallel algorithms, Prefix computation...
[...]

Journal Article•DOI•

Approximate algorithms for knapsack problem on parallel computers

[...]

P. S. Gopalakrishnan¹, I. V. Ramakrishnan², Laveen N. Kanal³•Institutions (3)

IBM¹, State University of New York System², University of Maryland, College Park³

01 Apr 1991-Information & Computation

TL;DR: This paper presents an efficient parallel algorithm for finding approximate solutions to the 0–1 knapsack problem that takes an e, 0 < e < 1, as a parameter and computes a solution such that the ratio of its deviation from the optimal solution is at most a fraction e of the ideal solution.

...read moreread less

Abstract: Computing an optimal solution to the knapsack problem is known to be NP-hard. Consequently, fast parallel algorithms for finding such a solution without using an exponential number of processors appear unlikely. An attractive alternative is to compute an approximate solution to this problem rapidly using a polynomial number of processors. In this paper, we present an efficient parallel algorithm for finding approximate solutions to the 0–1 knapsack problem. Our algorithm takes an e, 0 < e < 1, as a parameter and computes a solution such that the ratio of its deviation from the optimal solution is at most a fraction e of the optimal solution. For a problem instance having n items, this computation uses O(n52/e32) processors and requires O(log3n + log2nlog(1e)) time. The upper bound on the processor requirement of our algorithm is established by reducing it to a problem on weighted bipartite graphs. This processor complexity is a significant improvement over that of other known parallel algorithms for this problem.

...read moreread less

6 citations

Book Chapter•DOI•

Efficient deterministic parallel algorithms for integer sorting

[...]

Lin Chen¹•Institutions (1)

Ohio State University¹

01 Feb 1991

TL;DR: Several fastest deterministic algorithms including an optimal algorithm which sorts n distinct integers in O( log n) time using O(n/log n) processors on EREW PRAM for the case where the integers are in a range linear in n.

...read moreread less

Abstract: The main result of this paper is several fastest deterministic algorithms including: •an optimal algorithm which sorts n distinct integers in O(log n) time using O(n/log n) processors on EREW PRAM for the case where the integers are in a range linear in n; •an optimal algorithm which sorts n integers in O(log n/log log n) time using O(n log log n/log n) processors on CRCW PRAM for the case where the integers are in a range linear in n and a constant upper bounded number of integers have a constant lower bounded multiplicity.

...read moreread less

6 citations

Cites methods from "The power of parallel prefix"

...[ 15 ] gave an algorithm which runs in O(log n) time with O(n/log n) processors on EREW PRAM....
[...]

Collapse

The power of parallel prefix

Citations

Cites background from "The power of parallel prefix"

Additional excerpts

Cites methods from "The power of parallel prefix"

Related Papers (5)