Communication-Efficient Distributed Deep Learning: A Comprehensive Survey.

Open AccessPosted Content

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey.

- 10 Mar 2020 -

arXiv: Distributed, Parallel, and Cluste...

TLDR

A comprehensive survey of the communication-efficient distributed training algorithms in both system-level and algorithmic-level optimizations is provided, which provides the readers to understand what algorithms are more efficient under specific distributed environments and extrapolate potential directions for further optimizations.

Abstract:

Distributed deep learning becomes very common to reduce the overall training time by exploiting multiple computing devices (e.g., GPUs/TPUs) as the size of deep models and data sets increases. However, data communication between computing devices could be a potential bottleneck to limit the system scalability. How to address the communication problem in distributed deep learning is becoming a hot research topic recently. In this paper, we provide a comprehensive survey of the communication-efficient distributed training algorithms in both system-level and algorithmic-level optimizations. In the system-level, we demystify the system design and implementation to reduce the communication cost. In algorithmic-level, we compare different algorithms with theoretical convergence bounds and communication complexity. Specifically, we first propose the taxonomy of data-parallel distributed training algorithms, which contains four main dimensions: communication synchronization, system architectures, compression techniques, and parallelism of communication and computing. Then we discuss the studies in addressing the problems of the four dimensions to compare the communication cost. We further compare the convergence rates of different algorithms, which enable us to know how fast the algorithms can converge to the solution in terms of iterations. According to the system-level communication cost analysis and theoretical convergence speed comparison, we provide the readers to understand what algorithms are more efficient under specific distributed environments and extrapolate potential directions for further optimizations.

Citations

PDF

Open Access

More filters

Posted Content

Computing Graph Neural Networks: A Survey from Algorithms to Accelerators

Sergi Abadal, +4 more

- 30 Sep 2020 -

arXiv: Learning

TL;DR: A review of the field of GNNs is presented from the perspective of computing, and an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators is distilled.

...read moreread less

Posted Content

On-Device Machine Learning: An Algorithms and Learning Theory Perspective.

Sauptik Dhar, +5 more

- 02 Nov 2019 -

arXiv: Learning

TL;DR: This survey reformulates the problem of on-device learning as resource constrained learning where the resources are compute and memory to allow tools, techniques, and algorithms from a wide variety of research areas to be compared equitably.

...read moreread less

Journal ArticleDOI

HSIC Bottleneck Based Distributed Deep Learning Model for Load Forecasting in Smart Grid With a Comprehensive Survey

Md. Akhtaruzzaman, +5 more

- 24 Nov 2020 -

IEEE Access

TL;DR: A conceptual model of DDL for smart grids has been presented, where the HSIC (Hilbert-Schmidt Independence Criterion) Bottleneck technique has been incorporated to provide higher accuracy.

...read moreread less

Journal ArticleDOI

Distributed Artificial Intelligence-as-a-Service (DAIaaS) for Smarter IoE and 6G Environments.

Nourah Janbi, +3 more

- 13 Oct 2020 -

Sensors

TL;DR: A framework for Distributed AI as a Service (DAIaaS) provisioning for Internet of Everything (IoE) and 6G environments is proposed to facilitate standardization of distributed AI provisioning, allow developers to focus on the domain-specific details without worrying about distributed training and inference, and help systemize the mass-production of technologies for smarter environments.

...read moreread less

Posted Content

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Torsten Hoefler, +4 more

- 31 Jan 2021 -

arXiv: Learning

TL;DR: Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks as discussed by the authors, which can reduce energy and performance costs of deep learning by selectively pruning components.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018 -

arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Collapse

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey.

Citations

Computing Graph Neural Networks: A Survey from Algorithms to Accelerators

On-Device Machine Learning: An Algorithms and Learning Theory Perspective.

HSIC Bottleneck Based Distributed Deep Learning Model for Load Forecasting in Smart Grid With a Comprehensive Survey

Distributed Artificial Intelligence-as-a-Service (DAIaaS) for Smarter IoE and 6G Environments.

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

References

Deep Residual Learning for Image Recognition

Attention is All you Need

ImageNet: A large-scale hierarchical image database

ImageNet Large Scale Visual Recognition Challenge

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Related Papers (5)

Deep Residual Learning for Image Recognition

PyTorch: An Imperative Style, High-Performance Deep Learning Library

TensorFlow: a system for large-scale machine learning

Communication-Efficient Learning of Deep Networks from Decentralized Data

Large Scale Distributed Deep Networks