A Fundamental Tradeoff Between Computation and Communication in Distributed Computing

doi:10.1109/TIT.2017.2756959

Open AccessJournal ArticleDOI

A Fundamental Tradeoff Between Computation and Communication in Distributed Computing

Songze Li, +3 more

- 01 Jan 2018 -

IEEE Transactions on Information Theory

- Vol. 64, Iss: 1, pp 109-128

TLDR

A coded scheme, named “coded distributed computing” (CDC), is proposed to demonstrate that increasing the computation load of the Map functions by a factor of r can create novel coding opportunities that reduce the communication load by the same factor.

Abstract:

How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in distributed computing, ie, the two are inversely proportional to each other More specifically, a general distributed computing framework, motivated by commonly used structures like MapReduce, is considered, where the overall computation is decomposed into computing a set of “Map” and “Reduce” functions distributedly across multiple computing nodes A coded scheme, named “coded distributed computing” (CDC), is proposed to demonstrate that increasing the computation load of the Map functions by a factor of $r$ (ie, evaluating each function at $r$ carefully chosen nodes) can create novel coding opportunities that reduce the communication load by the same factor An information-theoretic lower bound on the communication load is also provided, which matches the communication load achieved by the CDC scheme As a result, the optimal computation-communication tradeoff in distributed computing is exactly characterized Finally, the coding techniques of CDC is applied to the Hadoop TeraSort benchmark to develop a novel CodedTeraSort algorithm, which is empirically demonstrated to speed up the overall job execution by $197\times $ – $339\times $ , for typical settings of interest

A Fundamental Tradeoff Between Computation and Communication in Distributed Computing

Citations

国際会議開催報告:2013 IEEE International Symposium on Information Theory

Gradient Coding: Avoiding Stragglers in Distributed Learning

Polynomial codes: an optimal design for high-dimensional coded matrix multiplication

The Exact Rate-Memory Tradeoff for Caching With Uncoded Prefetching

The Role of Caching in Future Communication Systems and Networks

References

MapReduce: simplified data processing on large clusters

Network information flow

The Google file system

Spark: cluster computing with working sets

Fog computing and its role in the internet of things

Related Papers (5)

Speeding Up Distributed Machine Learning Using Codes

Polynomial codes: an optimal design for high-dimensional coded matrix multiplication

Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

MapReduce: simplified data processing on large clusters

Gradient Coding: Avoiding Stragglers in Distributed Learning