scispace - formally typeset
Open AccessPosted Content

Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization.

Reads0
Chats0
TLDR
Zhang et al. as discussed by the authors proposed a sequential single path search (SSPS) method for mixed-precision quantization, in which the given constraints are introduced into its loss function to guide searching process.
Abstract
Since model quantization helps to reduce the model size and computation latency, it has been successfully applied in many applications of mobile phones, embedded devices and smart chips. The mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance. However, it is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints (e.g., hardware resources, energy consumption, model size and computation latency). To address this issue, we propose a novel sequential single path search (SSPS) method for mixed-precision quantization,in which the given constraints are introduced into its loss function to guide searching process. A single path search cell is used to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms. Moreover, we sequentially determine the candidate precisions according to the selection certainties to exponentially reduce the search space and speed up the convergence of searching process. Experiments show that our method can efficiently search the mixed-precision models for different architectures (e.g., ResNet-20, 18, 34, 50 and MobileNet-V2) and datasets (e.g., CIFAR-10, ImageNet and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform counterparts.

read more

Citations
More filters
Proceedings ArticleDOI

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

TL;DR: In this paper, the effects of quantization on inference cost-quality tradeoff curves were investigated using ResNet as a case study and quantization-aware training was used to achieve state-of-the-art results on ImageNet.
Posted Content

MWQ: Multiscale Wavelet Quantized Neural Networks.

TL;DR: Wang et al. as mentioned in this paper proposed a multiscale wavelet quantization (MWQ) method, which decomposes the original data into multi-scale frequency components by wavelet transform and then quantizes the components of different scales, respectively.
Posted Content

One Model for All Quantization: A Quantized Network Supporting Hot-Swap Bit-Width Adjustment.

TL;DR: In this paper, a hot-swappable quantization strategy is proposed to provide specific quantization strategies for different candidates through multiscale quantization, which significantly improves the performance of each quantization candidate, especially at ultra-low bit-widths.
Posted Content

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

TL;DR: In this article, the effects of quantization on inference cost-quality tradeoff curves were investigated using ResNet as a case study, and quantization-aware training achieved state-of-the-art results on ImageNet for 4-bit ResNet-50.
Posted Content

Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration.

TL;DR: In this paper, a novel encoding scheme using {-1, +1} to decompose quantized neural networks (QNNs) into multi-branch binary networks, which can be efficiently implemented by bitwise operations (i.e., xnor and bitcount) to achieve model compression, computational acceleration, and resource saving.
References
More filters
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings Article

Faster R-CNN: towards real-time object detection with region proposal networks

TL;DR: Ren et al. as discussed by the authors proposed a region proposal network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.

Automatic differentiation in PyTorch

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Proceedings ArticleDOI

Focal Loss for Dense Object Detection

TL;DR: This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Proceedings ArticleDOI

Learning Transferable Architectures for Scalable Image Recognition

TL;DR: NASNet as discussed by the authors proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset, which enables transferability and achieves state-of-the-art performance.
Related Papers (5)