A bridging model for parallel computation

doi:10.1145/79173.79181

Home
/
Papers
/
A bridging model for parallel computation

Journal Article•DOI•

A bridging model for parallel computation

Leslie G. Valiant¹•Institutions (1)

Harvard University¹

01 Aug 1990-Communications of The ACM (ACM)-Vol. 33, Iss: 8, pp 103-111

TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

read less

Abstract: The success of the von Neumann model of sequential computation is attributable to the fact that it is an efficient bridge between software and hardware: high-level languages can be efficiently compiled on to this model; yet it can be effeciently implemented in hardware. The author argues that an analogous bridge between software and hardware in required for parallel computation if that is to become as widely used. This article introduces the bulk-synchronous parallel (BSP) model as a candidate for this role, and gives results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

...read moreread less

20,309 citations

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

01 Jan 2008-Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

...read moreread less

17,663 citations

Cites background from "A bridging model for parallel compu..."

...Bulk Synchronous Programming [17] and some MPI primitives [11] provide higher-level abstractions that make it easier for programmers to write parallel programs....
[...]

Book•

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

[...]

Stephen Boyd¹, Neal Parikh¹, Eric Chu¹, Borja Peleato¹, Jonathan Eckstein² - Show less +1 more•Institutions (2)

Stanford University¹, Rutgers University²

23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

...read moreread less

17,433 citations

Cites background from "A bridging model for parallel compu..."

...Many modern graph frameworks are based on or inspired by Valiant’s bulk-synchronous parallel (BSP) model [164] for parallel computation....
[...]

Proceedings Article•

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

[...]

Matei Zaharia¹, Mosharaf Chowdhury¹, Tathagata Das¹, Ankur Dave¹, Justin Ma¹, Murphy McCauley¹, Michael J. Franklin¹, Scott Shenker¹, Ion Stoica¹ - Show less +5 more•Institutions (1)

University of California, Berkeley¹

25 Apr 2012

TL;DR: Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.

...read moreread less

Abstract: We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarse-grained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks.

...read moreread less

4,151 citations

Cites methods from "A bridging model for parallel compu..."

...Pregel [21] is a programming model for graph applications based on the Bulk Synchronous Parallel paradigm [32]....
[...]

Proceedings Article•DOI•

Pregel: a system for large-scale graph processing

[...]

Grzegorz Malewicz, Matthew H. Austern¹, Aart J. C. Bik¹, James C. Dehnert¹, Ilan Horn¹, Naty Leiser¹, Grzegorz Czajkowski¹ - Show less +3 more•Institutions (1)

Google¹

06 Jun 2010

TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.

...read moreread less

Abstract: Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.

...read moreread less

3,840 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

Probability Inequalities for sums of Bounded Random Variables

[...]

Wassily Hoeffding¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Mar 1963-Journal of the American Statistical Association

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.

...read moreread less

Abstract: Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr {S – ES ≥ nt} depend only on the endpoints of the ranges of the summands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.

...read moreread less

8,655 citations

Journal Article•DOI•

On Computable Numbers, with an Application to the Entscheidungsproblem

[...]

A. M. Turing¹•Institutions (1)

Princeton University¹

01 Jan 1937-Proceedings of The London Mathematical Society

TL;DR: This chapter discusses the application of the diagonal process of the universal computing machine, which automates the calculation of circle and circle-free numbers.

...read moreread less

Abstract: 1. Computing machines. 2. Definitions. Automatic machines. Computing machines. Circle and circle-free numbers. Computable sequences and numbers. 3. Examples of computing machines. 4. Abbreviated tables Further examples. 5. Enumeration of computable sequences. 6. The universal computing machine. 7. Detailed description of the universal machine. 8. Application of the diagonal process. Pagina 1 di 38 On computable numbers, with an application to the Entscheidungsproblem A. M. ...

...read moreread less

7,642 citations

Journal Article•DOI•

Universal classes of hash functions

[...]

J. Lawrence Carter¹, Mark N. Wegman¹•Institutions (1)

IBM¹

01 Apr 1979-Journal of Computer and System Sciences

TL;DR: An input independent average linear time algorithm for storage and retrieval on keys that makes a random choice of hash function from a suitable class of hash functions.

...read moreread less

2,886 citations

Journal Article•DOI•

Parallel Prefix Computation

[...]

Richard E. Ladner¹, Michael J. Fischer¹•Institutions (1)

University of Washington¹

01 Oct 1980-Journal of the ACM

TL;DR: A recurstve construction is used to obtain a product circuit for solving the prefix problem and a Boolean clrcmt which has depth 2[Iog2n] + 2 and size bounded by 14n is obtained for n-bit binary addmon.

...read moreread less

Abstract: The prefix problem is to compute all the products x t o x2 . . . . o xk for i ~ k .~ n, where o is an associative operation A recurstve construction IS used to obtain a product circuit for solving the prefix problem which has depth exactly [log:n] and size bounded by 4n An application yields fast, small Boolean ctrcmts to simulate fimte-state transducers. By simulating a sequentml adder, a Boolean clrcmt which has depth 2[Iog2n] + 2 and size bounded by 14n Is obtained for n-bit binary addmon The size can be decreased significantly by permitting the depth to increase by an addmve constant

...read moreread less

1,159 citations

Journal Article•DOI•

The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer

[...]

Gottlieb¹, Grishman², McAuliffe², Rudolph³, Snir² - Show less +1 more•Institutions (3)

Courant Institute of Mathematical Sciences¹, New York University², University of Illinois at Urbana–Champaign³

01 Feb 1983-IEEE Transactions on Computers

TL;DR: The design for the NYU Ultracomputer is presented, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements that uses an enhanced message switching network with the geometry of an Omega-network to approximate the ideal behavior of Schwartz's paracomputers model of computation.

...read moreread less

Abstract: We present the design for the NYU Ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements. This machine uses an enhanced message switching network with the geometry of an Omega-network to approximate the ideal behavior of Schwartz's paracomputer model of computation and to implement efficiently the important fetch-and-add synchronization primitive. We outine the hardware that would be required to build a 4096 processor system using 1990's technology. We also discuss system software issues, and present analytic studies of the network performance. Finally, we include a sample of our effort to implement and simulate parallel variants of important scientific programs.

...read moreread less

708 citations