Open AccessPosted Content
vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design
Reads0
Chats0
TLDR
In this article, the authors propose a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNN.Abstract:
The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher's flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the processing across multiple GPUs. We propose a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNNs. Our virtualized DNN (vDNN) reduces the average GPU memory usage of AlexNet by up to 89%, OverFeat by 91%, and GoogLeNet by 95%, a significant reduction in memory requirements of DNNs. Similar experiments on VGG-16, one of the deepest and memory hungry DNNs to date, demonstrate the memory-efficiency of our proposal. vDNN enables VGG-16 with batch size 256 (requiring 28 GB of memory) to be trained on a single NVIDIA Titan X GPU card containing 12 GB of memory, with 18% performance loss compared to a hypothetical, oracular GPU with enough memory to hold the entire DNN.read more
Citations
More filters
Posted Content
Training Deep Nets with Sublinear Memory Cost.
TL;DR: This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost.
Proceedings ArticleDOI
The Architectural Implications of Autonomous Driving: Constraints and Acceleration
TL;DR: With accelerator-based designs, this work is able to build an end-to-end autonomous driving system that meets all the design constraints, and explore the trade-offs among performance, power and the higher accuracy enabled by higher resolution cameras.
Proceedings ArticleDOI
Superneurons: dynamic GPU memory management for training deep neural networks
TL;DR: SuperNeurons as mentioned in this paper proposes a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity, which reduces network-wide peak memory usage down to the maximal memory usage among layers.
Proceedings ArticleDOI
TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
TL;DR: In this article, the authors present a vertically integrated hardware/software co-design, which includes a custom DIMM module enhanced with near-memory processing cores tailored for DL tensor operations.
Proceedings ArticleDOI
Pathways: Asynchronous Distributed Dataflow for ML
Paul Barham,Aakanksha Chowdhery,Jeffrey Dean,Sanjay Ghemawat,Steven Hand,D. Hurt,Michael Isard,Hyeontaek Lim,Ruoming Pang,Sudip Roy,Brennan Saeta,Parker Schuh,Ryan Sepassi,Laurent El Shafey,Chandramohan A. Thekkath,Yonghui Wu +15 more
TL;DR:
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings Article
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
TL;DR: Deep Compression as mentioned in this paper proposes a three-stage pipeline: pruning, quantization, and Huffman coding to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
Proceedings Article
Learning both weights and connections for efficient neural networks
TL;DR: In this paper, the authors proposed a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections using a three-step method.