scispace - formally typeset
Open AccessPosted Content

vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design

Reads0
Chats0
TLDR
In this article, the authors propose a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNN.
Abstract
The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher's flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the processing across multiple GPUs. We propose a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNNs. Our virtualized DNN (vDNN) reduces the average GPU memory usage of AlexNet by up to 89%, OverFeat by 91%, and GoogLeNet by 95%, a significant reduction in memory requirements of DNNs. Similar experiments on VGG-16, one of the deepest and memory hungry DNNs to date, demonstrate the memory-efficiency of our proposal. vDNN enables VGG-16 with batch size 256 (requiring 28 GB of memory) to be trained on a single NVIDIA Titan X GPU card containing 12 GB of memory, with 18% performance loss compared to a hypothetical, oracular GPU with enough memory to hold the entire DNN.

read more

Citations
More filters
Posted Content

Training Deep Nets with Sublinear Memory Cost.

TL;DR: This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost.
Proceedings ArticleDOI

The Architectural Implications of Autonomous Driving: Constraints and Acceleration

TL;DR: With accelerator-based designs, this work is able to build an end-to-end autonomous driving system that meets all the design constraints, and explore the trade-offs among performance, power and the higher accuracy enabled by higher resolution cameras.
Proceedings ArticleDOI

Superneurons: dynamic GPU memory management for training deep neural networks

TL;DR: SuperNeurons as mentioned in this paper proposes a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity, which reduces network-wide peak memory usage down to the maximal memory usage among layers.
Proceedings ArticleDOI

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

TL;DR: In this article, the authors present a vertically integrated hardware/software co-design, which includes a custom DIMM module enhanced with near-memory processing cores tailored for DL tensor operations.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings Article

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

TL;DR: Deep Compression as mentioned in this paper proposes a three-stage pipeline: pruning, quantization, and Huffman coding to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
Proceedings Article

Learning both weights and connections for efficient neural networks

TL;DR: In this paper, the authors proposed a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections using a three-step method.
Related Papers (5)