Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors

doi:10.1109/TCSVT.2020.3020569

Open AccessJournal ArticleDOI

Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors

Duy Thanh Nguyen, +2 more

- 01 Jun 2021 -

IEEE Transactions on Circuits and System...

- Vol. 31, Iss: 6, pp 2450-2464

Chats0

TLDR

A layer-specific design that employs different organizations that are optimized for the different layers, which significantly outperforms the previous works in terms of both throughput, off-chip access, and on-chip memory requirement.

Abstract:

Convolutional neural networks (CNNs) require both intensive computation and frequent memory access, which lead to a low processing speed and large power dissipation. Although the characteristics of the different layers in a CNN are frequently quite different, previous hardware designs have employed common optimization schemes for them. This paper proposes a layer-specific design that employs different organizations that are optimized for the different layers. The proposed design employs two layer-specific optimizations: layer-specific mixed data flow and layer-specific mixed precision. The mixed data flow aims to minimize the off-chip access while demanding a minimal on-chip memory (BRAM) resource of an FPGA device. The mixed precision quantization is to achieve both a lossless accuracy and an aggressive model compression, thereby further reducing the off-chip access. A Bayesian optimization approach is used to select the best sparsity for each layer, achieving the best trade-off between the accuracy and compression. This mixing scheme allows the entire network model to be stored in BRAMs of the FPGA to aggressively reduce the off-chip access, and thereby achieves a significant performance enhancement. The model size is reduced by 22.66-28.93 times compared to that in a full-precision network with a negligible degradation of accuracy on VOC, COCO, and ImageNet datasets. Furthermore, the combination of mixed dataflow and mixed precision significantly outperforms the previous works in terms of both throughput, off-chip access, and on-chip memory requirement.

Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors

Citations

Zero-Centered Fixed-Point Quantization With Iterative Retraining for Deep Convolutional Neural Network-Based Object Detectors

Real-Time SSDLite Object Detection on FPGA

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Resource-constrained FPGA implementation of YOLOv2

An Adaptive Row-based Weight Reuse Scheme for FPGA Implementation of Convolutional Neural Networks

References

Deep Residual Learning for Image Recognition

Going deeper with convolutions

ImageNet Large Scale Visual Recognition Challenge

Microsoft COCO: Common Objects in Context

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Related Papers (5)

A High-Performance Accelerator for Large-Scale Convolutional Neural Networks

High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression

Optimized Compression for Implementing Convolutional Neural Networks on FPGA

Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs

Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs: (Abstract Only)