HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

Open AccessPosted Content

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

Zhen Dong, +4 more

- 29 Apr 2019 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

HAWQ as mentioned in this paper is a second-order quantization method that selects the relative quantization precision of each layer, based on the layer's Hessian spectrum, and provides a deterministic fine-tuning order for quantizing layers.

Abstract:

Model size and inference speed/power have become a major challenge in the deployment of Neural Networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the precision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision. Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems. HAWQ allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum. Moreover, HAWQ provides a deterministic fine-tuning order for quantizing layers, based on second-order information. We show the results of our method on Cifar-10 using ResNet20, and on ImageNet using Inception-V3, ResNet50 and SqueezeNext models. Comparing HAWQ with state-of-the-art shows that we can achieve similar/better accuracy with $8\times$ activation compression ratio on ResNet20, as compared to DNAS~\cite{wu2018mixed}, and up to $1\%$ higher accuracy with up to $14\%$ smaller models on ResNet50 and Inception-V3, compared to recently proposed methods of RVQuant~\cite{park2018value} and HAQ~\cite{wang2018haq}. Furthermore, we show that we can quantize SqueezeNext to just 1MB model size while achieving above $68\%$ top1 accuracy on ImageNet.

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

Citations

Pre-trained Models for Natural Language Processing: A Survey

Forward and Backward Information Retention for Accurate Binary Neural Networks

PyHessian: Neural Networks Through the Lens of the Hessian

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Gradient-based learning applied to document recognition

Rethinking the Inception Architecture for Computer Vision

Related Papers (5)

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Deep Residual Learning for Image Recognition

PACT: Parameterized Clipping Activation for Quantized Neural Networks

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks

Trending Questions (1)