scispace - formally typeset
Open AccessProceedings Article

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Reads0
Chats0
TLDR
Deep Compression as mentioned in this paper proposes a three-stage pipeline: pruning, quantization, and Huffman coding to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
Abstract
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding. After the first two steps we retrain the network to fine tune the remaining connections and the quantized centroids. Pruning, reduces the number of connections by 9x to 13x; Quantization then reduces the number of bits that represent each connection from 32 to 5. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35x, from 240MB to 6.9MB, without loss of accuracy. Our method reduced the size of VGG-16 by 49x from 552MB to 11.3MB, again with no loss of accuracy. This allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory. Our compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. Benchmarked on CPU, GPU and mobile GPU, compressed network has 3x to 4x layerwise speedup and 3x to 7x better energy efficiency.

read more

Citations
More filters
Posted Content

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

TL;DR: This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
Posted Content

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Mingxing Tan, +1 more
- 28 May 2019 - 
TL;DR: A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.
Posted Content

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

TL;DR: This work proposes a small DNN architecture called SqueezeNet, which achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters and is able to compress to less than 0.5MB (510x smaller than AlexNet).
Posted Content

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

TL;DR: GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.
Proceedings ArticleDOI

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

TL;DR: ShuffleNet as discussed by the authors utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy, and achieves an actual speedup over AlexNet while maintaining comparable accuracy.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Posted Content

Fast R-CNN

TL;DR: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection that builds on previous work to efficiently classify object proposals using deep convolutional networks.