vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design

Open AccessPosted Content

vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design

Minsoo Rhu, +4 more

- 25 Feb 2016 -

arXiv: Distributed, Parallel, and Cluste...

Chats0

TLDR

In this article, the authors propose a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNN.

Abstract:

The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher's flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the processing across multiple GPUs. We propose a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNNs. Our virtualized DNN (vDNN) reduces the average GPU memory usage of AlexNet by up to 89%, OverFeat by 91%, and GoogLeNet by 95%, a significant reduction in memory requirements of DNNs. Similar experiments on VGG-16, one of the deepest and memory hungry DNNs to date, demonstrate the memory-efficiency of our proposal. vDNN enables VGG-16 with batch size 256 (requiring 28 GB of memory) to be trained on a single NVIDIA Titan X GPU card containing 12 GB of memory, with 18% performance loss compared to a hypothetical, oracular GPU with enough memory to hold the entire DNN.

Citations

PDF

Open Access

More filters

Posted Content

Training Deep Nets with Sublinear Memory Cost.

Tianqi Chen, +3 more

- 21 Apr 2016 -

arXiv: Learning

TL;DR: This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost.

...read moreread less

Proceedings ArticleDOI

The Architectural Implications of Autonomous Driving: Constraints and Acceleration

Shih-Chieh Lin, +6 more

TL;DR: With accelerator-based designs, this work is able to build an end-to-end autonomous driving system that meets all the design constraints, and explore the trade-offs among performance, power and the higher accuracy enabled by higher resolution cameras.

...read moreread less

Proceedings ArticleDOI

Superneurons: dynamic GPU memory management for training deep neural networks

Linnan Wang, +7 more

TL;DR: SuperNeurons as mentioned in this paper proposes a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity, which reduces network-wide peak memory usage down to the maximal memory usage among layers.

...read moreread less

Proceedings ArticleDOI

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

Youngeun Kwon, +2 more

TL;DR: In this article, the authors present a vertically integrated hardware/software co-design, which includes a custom DIMM module enhanced with near-memory processing cores tailored for DL tensor operations.

...read moreread less

Proceedings ArticleDOI

Pathways: Asynchronous Distributed Dataflow for ML

Paul Barham, +15 more

TL;DR:

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Proceedings Article

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Song Han, +3 more

TL;DR: Deep Compression as mentioned in this paper proposes a three-stage pipeline: pruning, quantization, and Huffman coding to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

...read moreread less

Proceedings Article

Learning both weights and connections for efficient neural networks

Song Han, +3 more

TL;DR: In this paper, the authors proposed a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections using a three-step method.

...read moreread less

vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design

Citations

Training Deep Nets with Sublinear Memory Cost.

The Architectural Implications of Autonomous Driving: Constraints and Acceleration

Superneurons: dynamic GPU memory management for training deep neural networks

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

Pathways: Asynchronous Distributed Dataflow for ML

References

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Gradient-based learning applied to document recognition

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Learning both weights and connections for efficient neural networks

Related Papers (5)

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions

Training Deep Nets with Sublinear Memory Cost.