scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Parallel Prefix Computation

01 Oct 1980-Journal of the ACM (ACM)-Vol. 27, Iss: 4, pp 831-838
TL;DR: A recurstve construction is used to obtain a product circuit for solving the prefix problem and a Boolean clrcmt which has depth 2[Iog2n] + 2 and size bounded by 14n is obtained for n-bit binary addmon.
Abstract: The prefix problem is to compute all the products x t o x2 . . . . o xk for i ~ k .~ n, where o is an associative operation A recurstve construction IS used to obtain a product circuit for solving the prefix problem which has depth exactly [log:n] and size bounded by 4n An application yields fast, small Boolean ctrcmts to simulate fimte-state transducers. By simulating a sequentml adder, a Boolean clrcmt which has depth 2[Iog2n] + 2 and size bounded by 14n Is obtained for n-bit binary addmon The size can be decreased significantly by permitting the depth to increase by an addmve constant

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

20,309 citations

Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

17,663 citations

Journal ArticleDOI
TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Abstract: The success of the von Neumann model of sequential computation is attributable to the fact that it is an efficient bridge between software and hardware: high-level languages can be efficiently compiled on to this model; yet it can be effeciently implemented in hardware. The author argues that an analogous bridge between software and hardware in required for parallel computation if that is to become as widely used. This article introduces the bulk-synchronous parallel (BSP) model as a candidate for this role, and gives results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

3,885 citations

Journal ArticleDOI
TL;DR: It is shown that addition of n-bit binary numbers can be performed on a chip with a regular layout in time proportional to log n and with area proportional to n.
Abstract: With VLSI architecture, the chip area and design regularity represent a better measure of cost than the conventional gate count. We show that addition of n-bit binary numbers can be performed on a chip with a regular layout in time proportional to log n and with area proportional to n.

1,147 citations


Cites methods from "Parallel Prefix Computation"

  • ...Although the same idea was used by Ladner and Fischer [8], their results are not directly applicable because they ignored fan-out restrictions and used the gate count rather than area as a complexity measure....

    [...]

  • ...Although the same idea was used by Ladner and Fischer [8], their results are not directly applicable because they ignored fan out restrictions and used the gate count rather than area as a complexity measure....

    [...]

Proceedings ArticleDOI
C. Ranger1, R. Raghuraman1, A. Penmetsa1, Gary Bradski1, Christos Kozyrakis1 
10 Feb 2007
TL;DR: It is established that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code.
Abstract: This paper evaluates the suitability of the MapReduce model for multi-core and multi-processor systems. MapReduce was created by Google for application development on data-centers with thousands of servers. It allows programmers to write functional-style code that is automatically parallelized and scheduled in a distributed system. We describe Phoenix, an implementation of MapReduce for shared-memory systems that includes a programming API and an efficient runtime system. The Phoenix runtime automatically manages thread creation, dynamic task scheduling, data partitioning, and fault tolerance across processor nodes. We study Phoenix with multi-core and symmetric multiprocessor systems and evaluate its performance potential and error recovery features. We also compare MapReduce code to code written in lower-level APIs such as P-threads. Overall, we establish that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code

1,058 citations

References
More filters
Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1967

350 citations

Book
01 Jan 1976

290 citations

Journal ArticleDOI
TL;DR: The length l of addition chains for z is shown to be bounded from below by log 2 z + log 2 s ( z )−2.13, where s (Z) denotes the sum of the digits in the binary expansion of z and the proof will also hold for addition-subtraction chains if s (z ) is replaced by an appropriate substitute.

93 citations