scispace - formally typeset
Open AccessProceedings Article

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Reads0
Chats0
TLDR
Deep Gradient Compression (DGC) as mentioned in this paper employs momentum correction, local gradient clipping, momentum factor masking, and warm-up training to preserve accuracy during compression, and achieves a gradient compression ratio from 270x to 600x without losing accuracy.
Abstract
Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections. In this paper, we find 99.9% of the gradient exchange in distributed SGD is redundant, and propose Deep Gradient Compression (DGC) to greatly reduce the communication bandwidth. To preserve accuracy during compression, DGC employs four methods: momentum correction, local gradient clipping, momentum factor masking, and warm-up training. We have applied Deep Gradient Compression to image classification, speech recognition, and language modeling with multiple datasets including Cifar10, ImageNet, Penn Treebank, and Librispeech Corpus. On these scenarios, Deep Gradient Compression achieves a gradient compression ratio from 270x to 600x without losing accuracy, cutting the gradient size of ResNet-50 from 97MB to 0.35MB, and for DeepSpeech from 488MB to 0.74MB. Deep gradient compression enables large-scale distributed training on inexpensive commodity 1Gbps Ethernet and facilitates distributed training on mobile.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Federated Learning with Non-IID Data.

TL;DR: This work presents a strategy to improve training on non-IID data by creating a small subset of data which is globally shared between all the edge devices, and shows that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.
Proceedings ArticleDOI

Exploiting Unintended Feature Leakage in Collaborative Learning

TL;DR: In this article, passive and active inference attacks are proposed to exploit the leakage of information about participants' training data in federated learning, where each participant can infer the presence of exact data points and properties that hold only for a subset of the training data and are independent of the properties of the joint model.
Journal ArticleDOI

Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing

TL;DR: A comprehensive survey of the recent research efforts on edge intelligence can be found in this paper, where the authors review the background and motivation for AI running at the network edge and provide an overview of the overarching architectures, frameworks, and emerging key technologies for deep learning model toward training/inference at the edge.
Journal ArticleDOI

Federated Learning in Mobile Edge Networks: A Comprehensive Survey

TL;DR: The concept of federated learning (FL) as mentioned in this paperederated learning has been proposed to enable collaborative training of an ML model and also enable DL for mobile edge network optimization in large-scale and complex mobile edge networks, where heterogeneous devices with varying constraints are involved.
Related Papers (5)