Proceedings ArticleDOI
Weighted-Entropy-Based Quantization for Deep Neural Networks
Eunhyeok Park,Junwhan Ahn,Sungjoo Yoo +2 more
- pp 7197-7205
Reads0
Chats0
TLDR
This paper proposes a novel method for quantizing weights and activations based on the concept of weighted entropy, which achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss.Abstract:
Quantization is considered as one of the most effective methods to optimize the inference cost of neural network models for their deployment to mobile and embedded systems, which have tight resource constraints. In such approaches, it is critical to provide low-cost quantization under a tight accuracy loss constraint (e.g., 1%). In this paper, we propose a novel method for quantizing weights and activations based on the concept of weighted entropy. Unlike recent work on binary-weight neural networks, our approach is multi-bit quantization, in which weights and activations can be quantized by any number of bits depending on the target accuracy. This facilitates much more flexible exploitation of accuracy-performance trade-off provided by different levels of quantization. Moreover, our scheme provides an automated quantization flow based on conventional training algorithms, which greatly reduces the design-time effort to quantize the network. According to our extensive evaluations based on practical neural network models for image classification (AlexNet, GoogLeNet and ResNet-50/101), object detection (R-FCN with 50-layer ResNet), and language modeling (an LSTM network), our method achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss. Also, compared to existing quantization schemes, ours provides higher accuracy with a similar resource constraint and requires much lower design effort.read more
Citations
More filters
Posted Content
PACT: Parameterized Clipping Activation for Quantized Neural Networks
Jungwook Choi,Zhuo Wang,Swagath Venkataramani,Pierce I.Jen Chuang,Vijayalakshmi Srinivasan,Kailash Gopalakrishnan +5 more
TL;DR: It is shown, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets.
Book ChapterDOI
A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers
TL;DR: A systematic weight pruning framework of DNNs using the alternating direction method of multipliers (ADMM) is presented, which can reduce the total computation by five times compared with the prior work and achieves a fast convergence rate.
Proceedings ArticleDOI
Variational Convolutional Neural Network Pruning
TL;DR: Variational technique is introduced to estimate distribution of a newly proposed parameter, called channel saliency, based on which redundant channels can be removed from model via a simple criterion, and results in significant size reduction and computation saving.
Proceedings ArticleDOI
Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss
Sang-il Jung,Chang-Yong Son,Seohyung Lee,Son Jinwoo,Jae-Joon Han,Youngjun Kwak,Sung Ju Hwang,Changkyu Choi +7 more
TL;DR: In this article, a quantization interval learning (QIL) method is proposed to quantize activations and weights via a trainable quantizer that transforms and discretizes them.
Journal ArticleDOI
Pruning and quantization for deep neural network acceleration: A survey
TL;DR: A survey on two types of network compression: pruning and quantization is provided, which compare current techniques, analyze their strengths and weaknesses, provide guidance for compressing networks, and discuss possible future compression techniques.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Posted Content
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.