Open AccessPosted Content
Up or Down? Adaptive Rounding for Post-Training Quantization
Reads0
Chats0
TLDR
AdaRound is proposed, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss that outperforms rounding-to-nearest by a significant margin and establishes a new state-of-the-art forPost- training quantization on several networks and tasks.Abstract:
When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.read more
Citations
More filters
Journal Article
Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming
TL;DR: Two pipelines are introduced, advanced and light, where the former involves minimizing the quantization errors of each layer by optimizing its parameters over the calibration set and using integer programming to optimally allocate the desired bit-width for each layer while constraining accuracy degradation or model compression.
Proceedings ArticleDOI
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
TL;DR: This work is able to show that ZeroQuant can reduce the precision for weights and activations to INT8 in a cost-free way for both BERT and GPT-3-style models with minimal accuracy impact, which leads to up to 5.19x/4.16x speedup on those models compared to FP16 inference.
Journal ArticleDOI
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
TL;DR: GPTQ as discussed by the authors quantizes GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline.
Journal ArticleDOI
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar,Dan Alistarh +1 more
TL;DR: SparseGPT as discussed by the authors prunes large-scale generative pretrained transformer (GPT) family models to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy.
Proceedings ArticleDOI
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
Elias Frantar,Dan Alistarh +1 more
TL;DR: A new compression framework which covers both weight pruning and quantization in a unified setting, is time- and space-efficient, and considerably improves upon the practical performance of existing post-training methods is introduced.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Proceedings Article
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke,Sam Gross,Francisco Massa,Adam Lerer,James Bradbury,Gregory Chanan,Trevor Killeen,Zeming Lin,Natalia Gimelshein,Luca Antiga,Alban Desmaison,Andreas Kopf,Edward Z. Yang,Zachary DeVito,Martin Raison,Alykhan Tejani,Sasank Chilamkurthy,Benoit Steiner,Lu Fang,Junjie Bai,Soumith Chintala +20 more
TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
Journal ArticleDOI
The Pascal Visual Object Classes Challenge: A Retrospective
TL;DR: A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.