Weighted-Entropy-Based Quantization for Deep Neural Networks

doi:10.1109/CVPR.2017.761

Proceedings ArticleDOI

Weighted-Entropy-Based Quantization for Deep Neural Networks

Eunhyeok Park, +2 more

- pp 7197-7205

Chats0

TLDR

This paper proposes a novel method for quantizing weights and activations based on the concept of weighted entropy, which achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss.

Abstract:

Quantization is considered as one of the most effective methods to optimize the inference cost of neural network models for their deployment to mobile and embedded systems, which have tight resource constraints. In such approaches, it is critical to provide low-cost quantization under a tight accuracy loss constraint (e.g., 1%). In this paper, we propose a novel method for quantizing weights and activations based on the concept of weighted entropy. Unlike recent work on binary-weight neural networks, our approach is multi-bit quantization, in which weights and activations can be quantized by any number of bits depending on the target accuracy. This facilitates much more flexible exploitation of accuracy-performance trade-off provided by different levels of quantization. Moreover, our scheme provides an automated quantization flow based on conventional training algorithms, which greatly reduces the design-time effort to quantize the network. According to our extensive evaluations based on practical neural network models for image classification (AlexNet, GoogLeNet and ResNet-50/101), object detection (R-FCN with 50-layer ResNet), and language modeling (an LSTM network), our method achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss. Also, compared to existing quantization schemes, ours provides higher accuracy with a similar resource constraint and requires much lower design effort.

Citations

PDF

Open Access

More filters

Posted Content

PACT: Parameterized Clipping Activation for Quantized Neural Networks

Jungwook Choi, +5 more

- 15 Feb 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: It is shown, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets.

...read moreread less

Book ChapterDOI

A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers

Tianyun Zhang, +6 more

- 10 Apr 2018 -

arXiv: Neural and Evolutionary Computing

TL;DR: A systematic weight pruning framework of DNNs using the alternating direction method of multipliers (ADMM) is presented, which can reduce the total computation by five times compared with the prior work and achieves a fast convergence rate.

...read moreread less

Proceedings ArticleDOI

Variational Convolutional Neural Network Pruning

Chenglong Zhao, +5 more

TL;DR: Variational technique is introduced to estimate distribution of a newly proposed parameter, called channel saliency, based on which redundant channels can be removed from model via a simple criterion, and results in significant size reduction and computation saving.

...read moreread less

Proceedings ArticleDOI

Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss

Sang-il Jung, +7 more

TL;DR: In this article, a quantization interval learning (QIL) method is proposed to quantize activations and weights via a trainable quantizer that transforms and discretizes them.

...read moreread less

Journal ArticleDOI

Pruning and quantization for deep neural network acceleration: A survey

Tailin Liang, +4 more

- 21 Oct 2021 -

Neurocomputing

TL;DR: A survey on two types of network compression: pruning and quantization is provided, which compare current techniques, analyze their strengths and weaknesses, provide guidance for compressing networks, and discuss possible future compression techniques.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

Weighted-Entropy-Based Quantization for Deep Neural Networks

Citations

PACT: Parameterized Clipping Activation for Quantized Neural Networks

A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers

Variational Convolutional Neural Network Pruning

Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss

Pruning and quantization for deep neural network acceleration: A survey

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet: A large-scale hierarchical image database

Going deeper with convolutions

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Related Papers (5)

Deep Residual Learning for Image Recognition

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

Learning both weights and connections for efficient neural networks

Very Deep Convolutional Networks for Large-Scale Image Recognition