scispace - formally typeset
Proceedings ArticleDOI

Gist: efficient data encoding for deep neural network training

Reads0
Chats0
TLDR
This paper investigates widely used DNNs and finds that the major contributors to memory footprint are intermediate layer outputs (feature maps), and introduces a framework for DNN-layer-specific optimizations that significantly reduce this source of main memory pressure on GPUs.
Abstract
Modern deep neural networks (DNNs) training typically relies on GPUs to train complex hundred-layer deep networks A significant problem facing both researchers and industry practitioners is that, as the networks get deeper, the available GPU main memory becomes a primary bottleneck, limiting the size of networks it can train In this paper, we investigate widely used DNNs and find that the major contributors to memory footprint are intermediate layer outputs (feature maps) We then introduce a framework for DNN-layer-specific optimizations (eg, convolution, ReLU, pool) that significantly reduce this source of main memory pressure on GPUs We find that a feature map typically has two uses that are spread far apart temporally Our key approach is to store an encoded representation of feature maps for this temporal gap and decode this data for use in the backward pass; the full-fidelity feature maps are used in the forward pass and relinquished immediately Based on this approach, we present Gist, our system that employs two classes of layer-specific encoding schemes -- lossless and lossy -- to exploit existing value redundancy in DNN training to significantly reduce the memory consumption of targeted feature maps For example, one insight is by taking advantage of the computational nature of back propagation from pool to ReLU layer, we can store the intermediate feature map using just 1 bit instead of 32 bits per value We deploy these mechanisms in a state-of-the-art DNN framework (CNTK) and observe that Gist reduces the memory footprint to upto 2X across 5 state-of-the-art image classification DNNs, with an average of 18X with only 4% performance overhead We also show that further software (eg, CuDNN) and hardware (eg, dynamic allocation) optimizations can result in even larger footprint reduction (upto 41X)

read more

Citations
More filters
Journal ArticleDOI

Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey

TL;DR: This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures.
Proceedings ArticleDOI

PipeDream: generalized pipeline parallelism for DNN training

TL;DR: PipeDream is presented, a system that adds inter-batch pipelining to intra-batch parallelism to further improve parallel training throughput, helping to better overlap computation with communication and reduce the amount of communication when possible.
Posted Content

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

TL;DR: This work develops a novel solution, Zero Redundancy Optimizer (ZeRO), to optimize memory, achieving both memory efficiency and scaling efficiency, and demonstrates ZeRO has the potential to scale beyond 1 Trillion parameters using today's hardware.
Proceedings ArticleDOI

RecNMP: accelerating personalized recommendation with near-memory processing

TL;DR: RecNMP as mentioned in this paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference, which is specifically tailored to production environments with heavy co-location of operators on a single server.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Posted Content

Deep Residual Learning for Image Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Related Papers (5)