scispace - formally typeset
Open AccessPosted Content

Up or Down? Adaptive Rounding for Post-Training Quantization

Reads0
Chats0
TLDR
AdaRound is proposed, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss that outperforms rounding-to-nearest by a significant margin and establishes a new state-of-the-art forPost- training quantization on several networks and tasks.
Abstract
When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.

read more

Citations
More filters
Journal Article

Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

TL;DR: Two pipelines are introduced, advanced and light, where the former involves minimizing the quantization errors of each layer by optimizing its parameters over the calibration set and using integer programming to optimally allocate the desired bit-width for each layer while constraining accuracy degradation or model compression.
Proceedings ArticleDOI

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

TL;DR: This work is able to show that ZeroQuant can reduce the precision for weights and activations to INT8 in a cost-free way for both BERT and GPT-3-style models with minimal accuracy impact, which leads to up to 5.19x/4.16x speedup on those models compared to FP16 inference.
Journal ArticleDOI

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

TL;DR: GPTQ as discussed by the authors quantizes GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline.
Journal ArticleDOI

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

Elias Frantar, +1 more
- 02 Jan 2023 - 
TL;DR: SparseGPT as discussed by the authors prunes large-scale generative pretrained transformer (GPT) family models to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy.
Proceedings ArticleDOI

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

TL;DR: A new compression framework which covers both weight pruning and quantization in a unified setting, is time- and space-efficient, and considerably improves upon the practical performance of existing post-training methods is introduced.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Journal ArticleDOI

The Pascal Visual Object Classes Challenge: A Retrospective

TL;DR: A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.
Related Papers (5)