scispace - formally typeset
Search or ask a question
Proceedings Article

Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

20 Feb 2020-Vol. 35, Iss: 10, pp 8697-8705
TL;DR: This paper proposed multipoint quantization, a quantization method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number.
Abstract: We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an ``effect of mixed precision'' but without physical mixed precision implementations (which requires specialized hardware accelerators). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.
Citations
More filters
Proceedings ArticleDOI
01 Jun 2022
TL;DR: Zhang et al. as mentioned in this paper proposed a local object reinforcement that locates the target objects at different scales and positions of the synthetic images and introduced a marginal distance constraint to form class-related features distributed in a coarse area.
Abstract: Learning to synthesize data has emerged as a promising direction in zero-shot quantization (ZSQ), which represents neural networks by low-bit integer without accessing any of the real data. In this paper, we observe an interesting phenomenon of intra-class heterogeneity in real data and show that existing methods fail to retain this property in their synthetic images, which causes a limited performance increase. To address this issue, we propose a novel zero-shot quantization method referred to as IntraQ. First, we propose a local object reinforcement that locates the target objects at different scales and positions of the synthetic images. Second, we introduce a marginal distance constraint to form class-related features distributed in a coarse area. Lastly, we devise a soft inception loss which injects a soft prior label to prevent the synthetic images from being over-fitting to a fixed object. Our IntraQ is demonstrated to well retain the intra-class heterogeneity in the synthetic images and also observed to perform state-of-the-art. For example, compared to the advanced ZSQ, our IntraQ obtains 9.17% increase of the top-1 accuracy on ImageNet when all layers of MobileNetV1 are quantized to 4-bit. Code is at https://github.com/zysxmu/IntraQ

8 citations

Proceedings ArticleDOI
23 May 2022
TL;DR: This work proposes two approaches to conduct image-aware and pixel-aware dynamic binarization in a model for human pose estimation and improves 5.2% and 3.6% mAP on the COCO test-dev benchmark for ResNet-18/34 architectures respectively.
Abstract: Binary neural networks (BNNs) contribute a lot to the efficiency of image classification models. However, in dense predication tasks such as human pose estimation, predictions in different locations are coupled and rely on the extraction of features across entire images. As a result, more robust and adaptive binarization is required to bridge the performance gap between binarized and full precision models. We propose two approaches to conduct image-aware and pixel-aware dynamic binarization in a model for human pose estimation. Firstly, a simplified dynamic thresholding is leveraged in the backbone to determine unique binarization thresholds for each image. Secondly, in the decoder, we decouple binarization for each pixel according to the activations surrounding the pixel. Dynamic filtering modules are proposed to determine a different binarization strategy for each pixel. Compared with the strong baselines, the proposed framework improves 5.2% and 3.6% mAP on the COCO test-dev benchmark for ResNet-18/34 architectures respectively.

2 citations

Journal ArticleDOI
Debbie Hughes1
TL;DR: Zhang et al. as mentioned in this paper proposed a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization, which is based on two important properties of batch normalization statistics (BNS) observed in deep layers of the trained network.
Abstract: While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from scarce images. To alleviate this limitation, in this paper, we leverage the synthetic data introduced by zero-shot quantization with calibration dataset and propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization. The method is based on two important properties of batch normalization statistics (BNS) we observed in deep layers of the trained network, i.e., inter-class separation and intra-class incohesion. To preserve this fine-grained distribution information: 1) We calculate the per-class BNS of the calibration dataset as the BNS centers of each class and propose a BNS-centralized loss to force the synthetic data distributions of different classes to be close to their own centers. 2) We add Gaussian noise into the centers to imitate the incohesion and propose a BNS-distorted loss to force the synthetic data distribution of the same class to be close to the distorted centers. By utilizing these two fine-grained losses, our method manifests the state-of-the-art performance on ImageNet, especially when both the first and last layers are quantized to the low-bit. Code is at https://github.com/zysxmu/FDDA .

1 citations

Book ChapterDOI
01 Jan 2022
TL;DR: BCDNet as discussed by the authors proposes a binary multi-layer perceptron (MLP) block as an alternative to binary convolution blocks to directly model contextual dependencies, where both short-range and long-range feature dependencies are modeled by binary MLP, where the former provides local inductive bias and the latter breaks limited receptive field in binary convolutions.
Abstract: Existing Binary Neural Networks (BNNs) mainly operate on local convolutions with binarization function. However, such simple bit operations lack the ability of modeling contextual dependencies, which is critical for learning discriminative deep representations in vision models. In this work, we tackle this issue by presenting new designs of binary neural modules, which enables BNNs to learn effective contextual dependencies. First, we propose a binary multi-layer perceptron (MLP) block as an alternative to binary convolution blocks to directly model contextual dependencies. Both short-range and long-range feature dependencies are modeled by binary MLPs, where the former provides local inductive bias and the latter breaks limited receptive field in binary convolutions. Second, to improve the robustness of binary models with contextual dependencies, we compute the contextual dynamic embeddings to determine the binarization thresholds in general binary convolutional blocks. Armed with our binary MLP blocks and improved binary convolution, we build the BNNs with explicit Contextual Dependency modeling, termed as BCDNet. On the standard ImageNet-1K classification benchmark, the BCDNet achieves 72.3% Top-1 accuracy and outperforms leading binary methods by a large margin. In particular, the proposed BCDNet exceeds the state-of-the-art ReActNet-A by 2.9% Top-1 accuracy with similar operations. Our code is available at https://github.com/Sense-GVT/BCDNet .

1 citations

Book ChapterDOI
01 Jan 2023
TL;DR: In this article , the authors developed accurate power consumption models for all arithmetic operations in the DNN, under various working conditions, and presented PANN (power-aware neural network), a simple approach for approximating any full-precision network by a low-power fixedprecision variant.
Abstract: Existing approaches for reducing DNN power consumption rely on quite general principles, including avoidance of multiplication operations and aggressive quantization of weights and activations. However, these methods do not consider the precise power consumed by each module in the network and are therefore not optimal. In this paper we develop accurate power consumption models for all arithmetic operations in the DNN, under various working conditions. We reveal several important factors that have been overlooked to date. Based on our analysis, we present PANN (power-aware neural network), a simple approach for approximating any full-precision network by a low-power fixed-precision variant. Our method can be applied to a pre-trained network and can also be used during training to achieve improved performance. Unlike previous methods, PANN incurs only a minor degradation in accuracy w.r.t. the full-precision version of the network and enables to seamlessly traverse the power-accuracy trade-off at deployment time.