scispace - formally typeset
Open AccessProceedings ArticleDOI

Data-Free Quantization Through Weight Equalization and Bias Correction

Reads0
Chats0
TLDR
This work introduces a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection, and achieves near-original model performance on common computer vision architectures and tasks.
Abstract
We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference on modern deep learning hardware. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied to many common computer vision architectures with a straight forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

CrypTFlow: Secure TensorFlow Inference

TL;DR: In this article, the authors present CrypTFlow, a system that converts TensorFlow inference code into Secure Multi-Party Computation (MPC) protocols at the push of a button.
Proceedings ArticleDOI

Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression

TL;DR: This paper analyzes two popular network compression techniques, i.e. filter pruning and low-rank decomposition, in a unified sense and proposes to compress the whole network jointly instead of in a layer-wise manner.
Posted Content

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation.

TL;DR: This paper presents a workflow for 8-bit quantization that is able to maintain accuracy within 1% of the floating-point baseline on all networks studied, including models that are more difficult to quantize, such as MobileNets and BERT-large.
Posted Content

Up or Down? Adaptive Rounding for Post-Training Quantization

TL;DR: AdaRound is proposed, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss that outperforms rounding-to-nearest by a significant margin and establishes a new state-of-the-art forPost- training quantization on several networks and tasks.
Proceedings ArticleDOI

CrypTFlow2: Practical 2-Party Secure Inference.

TL;DR: In this article, the authors present CrypTFlow2, a cryptographic framework for secure inference over realistic deep neural networks (DNNs) using secure 2-party computation.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Book ChapterDOI

SSD: Single Shot MultiBox Detector

TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.
Posted Content

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

TL;DR: This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
Related Papers (5)