Gist: efficient data encoding for deep neural network training

doi:10.1109/ISCA.2018.00070

Proceedings ArticleDOI

Gist: efficient data encoding for deep neural network training

Animesh Jain, +4 more

- pp 776-789

Chats0

TLDR

This paper investigates widely used DNNs and finds that the major contributors to memory footprint are intermediate layer outputs (feature maps), and introduces a framework for DNN-layer-specific optimizations that significantly reduce this source of main memory pressure on GPUs.

Abstract:

Modern deep neural networks (DNNs) training typically relies on GPUs to train complex hundred-layer deep networks A significant problem facing both researchers and industry practitioners is that, as the networks get deeper, the available GPU main memory becomes a primary bottleneck, limiting the size of networks it can train In this paper, we investigate widely used DNNs and find that the major contributors to memory footprint are intermediate layer outputs (feature maps) We then introduce a framework for DNN-layer-specific optimizations (eg, convolution, ReLU, pool) that significantly reduce this source of main memory pressure on GPUs We find that a feature map typically has two uses that are spread far apart temporally Our key approach is to store an encoded representation of feature maps for this temporal gap and decode this data for use in the backward pass; the full-fidelity feature maps are used in the forward pass and relinquished immediately Based on this approach, we present Gist, our system that employs two classes of layer-specific encoding schemes -- lossless and lossy -- to exploit existing value redundancy in DNN training to significantly reduce the memory consumption of targeted feature maps For example, one insight is by taking advantage of the computational nature of back propagation from pool to ReLU layer, we can store the intermediate feature map using just 1 bit instead of 32 bits per value We deploy these mechanisms in a state-of-the-art DNN framework (CNTK) and observe that Gist reduces the memory footprint to upto 2X across 5 state-of-the-art image classification DNNs, with an average of 18X with only 4% performance overhead We also show that further software (eg, CuDNN) and hardware (eg, dynamic allocation) optimizations can result in even larger footprint reduction (upto 41X)

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey

Lei Deng, +4 more

TL;DR: This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures.

...read moreread less

Proceedings ArticleDOI

PipeDream: generalized pipeline parallelism for DNN training

Deepak Narayanan, +7 more

TL;DR: PipeDream is presented, a system that adds inter-batch pipelining to intra-batch parallelism to further improve parallel training throughput, helping to better overlap computation with communication and reduce the amount of communication when possible.

...read moreread less

Posted Content

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

Samyam Rajbhandari, +3 more

- 04 Oct 2019 -

arXiv: Learning

TL;DR: This work develops a novel solution, Zero Redundancy Optimizer (ZeRO), to optimize memory, achieving both memory efficiency and scaling efficiency, and demonstrates ZeRO has the potential to scale beyond 1 Trillion parameters using today's hardware.

...read moreread less

Proceedings ArticleDOI

RecNMP: accelerating personalized recommendation with near-memory processing

Liu Ke, +20 more

TL;DR: RecNMP as mentioned in this paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference, which is specifically tailored to production environments with heavy co-location of operators on a single server.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

Collapse

Gist: efficient data encoding for deep neural network training

Citations

Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey

PipeDream: generalized pipeline parallelism for DNN training

Machine Learning at Facebook: Understanding Inference at the Edge

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

RecNMP: accelerating personalized recommendation with near-memory processing

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep Residual Learning for Image Recognition

Related Papers (5)

Deep Residual Learning for Image Recognition

In-Datacenter Performance Analysis of a Tensor Processing Unit

ImageNet Classification with Deep Convolutional Neural Networks

EIE: efficient inference engine on compressed deep neural network

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding