Open AccessPosted Content
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey.
TLDR
A comprehensive survey of the communication-efficient distributed training algorithms in both system-level and algorithmic-level optimizations is provided, which provides the readers to understand what algorithms are more efficient under specific distributed environments and extrapolate potential directions for further optimizations.Abstract:
Distributed deep learning becomes very common to reduce the overall training time by exploiting multiple computing devices (e.g., GPUs/TPUs) as the size of deep models and data sets increases. However, data communication between computing devices could be a potential bottleneck to limit the system scalability. How to address the communication problem in distributed deep learning is becoming a hot research topic recently. In this paper, we provide a comprehensive survey of the communication-efficient distributed training algorithms in both system-level and algorithmic-level optimizations. In the system-level, we demystify the system design and implementation to reduce the communication cost. In algorithmic-level, we compare different algorithms with theoretical convergence bounds and communication complexity. Specifically, we first propose the taxonomy of data-parallel distributed training algorithms, which contains four main dimensions: communication synchronization, system architectures, compression techniques, and parallelism of communication and computing. Then we discuss the studies in addressing the problems of the four dimensions to compare the communication cost. We further compare the convergence rates of different algorithms, which enable us to know how fast the algorithms can converge to the solution in terms of iterations. According to the system-level communication cost analysis and theoretical convergence speed comparison, we provide the readers to understand what algorithms are more efficient under specific distributed environments and extrapolate potential directions for further optimizations.read more
Citations
More filters
Posted Content
Computing Graph Neural Networks: A Survey from Algorithms to Accelerators
TL;DR: A review of the field of GNNs is presented from the perspective of computing, and an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators is distilled.
Posted Content
On-Device Machine Learning: An Algorithms and Learning Theory Perspective.
TL;DR: This survey reformulates the problem of on-device learning as resource constrained learning where the resources are compute and memory to allow tools, techniques, and algorithms from a wide variety of research areas to be compared equitably.
Journal ArticleDOI
HSIC Bottleneck Based Distributed Deep Learning Model for Load Forecasting in Smart Grid With a Comprehensive Survey
Md. Akhtaruzzaman,Mohammad Kamrul Hasan,S. Rayhan Kabir,Siti Norul Huda Sheikh Abdullah,Muhammad Jafar Sadeq,Eklas Hossain +5 more
TL;DR: A conceptual model of DDL for smart grids has been presented, where the HSIC (Hilbert-Schmidt Independence Criterion) Bottleneck technique has been incorporated to provide higher accuracy.
Journal ArticleDOI
Distributed Artificial Intelligence-as-a-Service (DAIaaS) for Smarter IoE and 6G Environments.
TL;DR: A framework for Distributed AI as a Service (DAIaaS) provisioning for Internet of Everything (IoE) and 6G environments is proposed to facilitate standardization of distributed AI provisioning, allow developers to focus on the domain-specific details without worrying about distributed training and inference, and help systemize the mass-production of technologies for smarter environments.
Posted Content
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
TL;DR: Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks as discussed by the authors, which can reduce energy and performance costs of deep learning by selectively pruning components.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Related Papers (5)
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke,Sam Gross,Francisco Massa,Adam Lerer,James Bradbury,Gregory Chanan,Trevor Killeen,Zeming Lin,Natalia Gimelshein,Luca Antiga,Alban Desmaison,Andreas Kopf,Edward Z. Yang,Zachary DeVito,Martin Raison,Alykhan Tejani,Sasank Chilamkurthy,Benoit Steiner,Lu Fang,Junjie Bai,Soumith Chintala +20 more
TensorFlow: a system for large-scale machine learning
Martín Abadi,Paul Barham,Jianmin Chen,Zhifeng Chen,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Geoffrey Irving,Michael Isard,Manjunath Kudlur,Josh Levenberg,Rajat Monga,Sherry Moore,Derek G. Murray,Benoit Steiner,Paul A. Tucker,Vijay K. Vasudevan,Pete Warden,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +21 more