Value-Aware Quantization for Training and Inference of Neural Networks

doi:10.1007/978-3-030-01225-0_36

Open AccessBook ChapterDOI

Value-Aware Quantization for Training and Inference of Neural Networks

Eunhyeok Park, +2 more

- pp 608-624

Chats0

TLDR

In this paper, value-aware quantization is proposed to apply aggressively reduced precision to the majority of data while separately handling a small amount of large values in high precision, which reduces total quantization errors under very low precision.

Abstract:

We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large values in high precision, which reduces total quantization errors under very low precision. We present new techniques to apply the proposed quantization to training and inference. The experiments show that our method with 3-bit activations (with 2% of large ones) can give the same training accuracy as full-precision one while offering significant (41.6% and 53.7%) reductions in the memory cost of activations in ResNet-152 and Inception-v3 compared with the state-of-the-art method. Our experiments also show that deep networks such as Inception-v3, ResNet-101 and DenseNet-121 can be quantized for inference with 4-bit weights and activations (with 1% 16-bit data) within 1% top-1 accuracy drop.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision

Zhen Dong, +4 more

TL;DR: Hessian AWare Quantization (HAWQ), a novel second-order quantization method that allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum, is introduced.

...read moreread less

Proceedings ArticleDOI

ZeroQ: A Novel Zero Shot Quantization Framework

Yaohui Cai, +5 more

TL;DR: THE AUTHORS' enables mixed-precision quantization without any access to the training or validation data, and it can finish the entire quantization process in less than 30s, which is very low computational overhead.

...read moreread less

Proceedings ArticleDOI

Energy-efficient neural network accelerator based on outlier-aware low-precision computation

Eunhyeok Park, +2 more

TL;DR: The outlier-aware accelerator (OLAccel) performs dense and low-precision computations for a majority of data (weights and activations) while efficiently handling a small number of sparse and high-pre precision outliers (e.g., amounting to 3% of total data).

...read moreread less

Posted Content

Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model.

Aishwarya Bhandare, +6 more

- 03 Jun 2019 -

arXiv: Learning

TL;DR: This work quantizes a trained Transformer machine language translation model leveraging INT8/VNNI instructions in the latest Intel Cascade Lake processors to improve inference performance while maintaining less than 0.5% drop in accuracy.

...read moreread less

Posted Content

HAWQV3: Dyadic Neural Network Quantization

Zhewei Yao, +10 more

- 20 Nov 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents HAWQV3, a novel dyadic quantization framework, and shows that mixed-precision INT4/8 quantization can be used to achieve higher speed ups, as compared to INT8 inference, with minimal impact on accuracy.

...read moreread less

Collapse

References

PDF

Open Access

More filters

ReportDOI

Building a large annotated corpus of English: the penn treebank

Mitchell Marcus, +2 more

- 01 Jun 1993 -

Computational Linguistics

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

...read moreread less

Book ChapterDOI

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

Mohammad Rastegari, +4 more

TL;DR: The Binary-Weight-Network version of AlexNet is compared with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than \(16\,\%\) in top-1 accuracy.

...read moreread less

Posted Content

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, +74 more

- 16 Apr 2017 -

arXiv: Hardware Architecture

TL;DR: This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.

...read moreread less

Posted Content

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, +5 more

- 18 Oct 2016 -

arXiv: Learning

TL;DR: Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.

...read moreread less

Posted Content

Recurrent Neural Network Regularization

Wojciech Zaremba, +2 more

- 08 Sep 2014 -

arXiv: Neural and Evolutionary Computing

TL;DR: This paper shows how to correctly apply dropout to LSTMs, and shows that it substantially reduces overfitting on a variety of tasks.

...read moreread less

Collapse

arXiv: Computer Vision and Pattern Recog...

Value-Aware Quantization for Training and Inference of Neural Networks

Citations

HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision

ZeroQ: A Novel Zero Shot Quantization Framework

Energy-efficient neural network accelerator based on outlier-aware low-precision computation

Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model.

HAWQV3: Dyadic Neural Network Quantization

References

Building a large annotated corpus of English: the penn treebank

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

In-Datacenter Performance Analysis of a Tensor Processing Unit

Federated Learning: Strategies for Improving Communication Efficiency

Recurrent Neural Network Regularization

Related Papers (5)

Deep Residual Learning for Image Recognition

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Rethinking the Inception Architecture for Computer Vision

MobileNetV2: Inverted Residuals and Linear Bottlenecks

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications