scispace - formally typeset
Open AccessPosted Content

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

Reads0
Chats0
TLDR
HAWQ as mentioned in this paper is a second-order quantization method that selects the relative quantization precision of each layer, based on the layer's Hessian spectrum, and provides a deterministic fine-tuning order for quantizing layers.
Abstract
Model size and inference speed/power have become a major challenge in the deployment of Neural Networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the precision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision. Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems. HAWQ allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum. Moreover, HAWQ provides a deterministic fine-tuning order for quantizing layers, based on second-order information. We show the results of our method on Cifar-10 using ResNet20, and on ImageNet using Inception-V3, ResNet50 and SqueezeNext models. Comparing HAWQ with state-of-the-art shows that we can achieve similar/better accuracy with $8\times$ activation compression ratio on ResNet20, as compared to DNAS~\cite{wu2018mixed}, and up to $1\%$ higher accuracy with up to $14\%$ smaller models on ResNet50 and Inception-V3, compared to recently proposed methods of RVQuant~\cite{park2018value} and HAQ~\cite{wang2018haq}. Furthermore, we show that we can quantize SqueezeNext to just 1MB model size while achieving above $68\%$ top1 accuracy on ImageNet.

read more

Citations
More filters
Journal ArticleDOI

Pre-trained Models for Natural Language Processing: A Survey

TL;DR: Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era as mentioned in this paper, and a comprehensive review of PTMs for NLP can be found in this survey.
Proceedings ArticleDOI

Forward and Backward Information Retention for Accurate Binary Neural Networks

TL;DR: The proposed Information Retention Network (IR-Net) is the first to investigate both forward and backward processes of binary networks from the unified information perspective, which provides new insight into the mechanism of network binarization.
Posted Content

PyHessian: Neural Networks Through the Lens of the Hessian

TL;DR: PYHESSIAN, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks, shows new finer-scale insights, demonstrating that while conventional wisdom is sometimes validated, in other cases it is simply incorrect.
Posted Content

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

TL;DR: AdaHessian is introduced, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the Hessian, and it exhibits robustness towards its hyperparameters.
Posted Content

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

TL;DR: This paper proposed a method for quantizing BERT models to 2-bit Hessian information, which can achieve comparable performance to baseline with at most 3.3% performance degradation, even with ultra-low precision quantization.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Related Papers (5)
Trending Questions (1)
Can you compare gguf to awq to gptq?

HAWQ (Hessian AWare Quantization) is compared to DNAS, RVQuant, and HAQ, showing similar/better accuracy and smaller models.