Up or Down? Adaptive Rounding for Post-Training Quantization

Open AccessPosted Content

Up or Down? Adaptive Rounding for Post-Training Quantization

Markus Nagel, +4 more

- 22 Apr 2020 -

arXiv: Learning

Chats0

TLDR

AdaRound is proposed, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss that outperforms rounding-to-nearest by a significant margin and establishes a new state-of-the-art forPost- training quantization on several networks and tasks.

Abstract:

When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.

Citations

PDF

Open Access

More filters

Journal Article

Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Itay Hubara, +4 more

- 04 May 2021 -

arXiv: Learning

TL;DR: Two pipelines are introduced, advanced and light, where the former involves minimizing the quantization errors of each layer by optimizing its parameters over the calibration set and using integer programming to optimally allocate the desired bit-width for each layer while constraining accuracy degradation or model compression.

...read moreread less

Proceedings ArticleDOI

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Zhewei Yao, +5 more

TL;DR: This work is able to show that ZeroQuant can reduce the precision for weights and activations to INT8 in a cost-free way for both BERT and GPT-3-style models with minimal accuracy impact, which leads to up to 5.19x/4.16x speedup on those models compared to FP16 inference.

...read moreread less

Journal ArticleDOI

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Elias Frantar, +3 more

- 31 Oct 2022 -

arXiv.org

TL;DR: GPTQ as discussed by the authors quantizes GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline.

...read moreread less

Journal ArticleDOI

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

Elias Frantar, +1 more

- 02 Jan 2023 -

arXiv.org

TL;DR: SparseGPT as discussed by the authors prunes large-scale generative pretrained transformer (GPT) family models to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy.

...read moreread less

Proceedings ArticleDOI

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

Elias Frantar, +1 more

TL;DR: A new compression framework which covers both weight pruning and quantization in a uniﬁed setting, is time- and space-efficient, and considerably improves upon the practical performance of existing post-training methods is introduced.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Proceedings Article

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

...read moreread less

Journal ArticleDOI

The Pascal Visual Object Classes Challenge: A Retrospective

Mark Everingham, +5 more

- 01 Jan 2015 -

International Journal of Computer Vision

TL;DR: A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

...read moreread less

Collapse

Journal of Machine Learning Research

Quantizing deep convolutional networks for efficient inference: A whitepaper

Raghuraman Krishnamoorthi

- 21 Jun 2018 -

arXiv: Learning

Up or Down? Adaptive Rounding for Post-Training Quantization

Citations

Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

ImageNet Large Scale Visual Recognition Challenge

PyTorch: An Imperative Style, High-Performance Deep Learning Library

The Pascal Visual Object Classes Challenge: A Retrospective

Related Papers (5)