Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization.

Open AccessPosted Content

Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization.

Qigong Sun, +5 more

- 04 Mar 2021 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

Zhang et al. as discussed by the authors proposed a sequential single path search (SSPS) method for mixed-precision quantization, in which the given constraints are introduced into its loss function to guide searching process.

Abstract:

Since model quantization helps to reduce the model size and computation latency, it has been successfully applied in many applications of mobile phones, embedded devices and smart chips. The mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance. However, it is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints (e.g., hardware resources, energy consumption, model size and computation latency). To address this issue, we propose a novel sequential single path search (SSPS) method for mixed-precision quantization,in which the given constraints are introduced into its loss function to guide searching process. A single path search cell is used to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms. Moreover, we sequentially determine the candidate precisions according to the selection certainties to exponentially reduce the search space and speed up the convergence of searching process. Experiments show that our method can efficiently search the mixed-precision models for different architectures (e.g., ResNet-20, 18, 34, 50 and MobileNet-V2) and datasets (e.g., CIFAR-10, ImageNet and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform counterparts.

Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization.

Citations

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

MWQ: Multiscale Wavelet Quantized Neural Networks.

One Model for All Quantization: A Quantized Network Supporting Hot-Swap Bit-Width Adjustment.

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration.

References

Microsoft COCO: Common Objects in Context

Faster R-CNN: towards real-time object detection with region proposal networks

Automatic differentiation in PyTorch

Focal Loss for Dense Object Detection

Learning Transferable Architectures for Scalable Image Recognition

Related Papers (5)

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks

Differentiable Joint Pruning and Quantization for Hardware Efficiency