scispace - formally typeset
Open AccessJournal ArticleDOI

Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors

Reads0
Chats0
TLDR
A layer-specific design that employs different organizations that are optimized for the different layers, which significantly outperforms the previous works in terms of both throughput, off-chip access, and on-chip memory requirement.
Abstract
Convolutional neural networks (CNNs) require both intensive computation and frequent memory access, which lead to a low processing speed and large power dissipation. Although the characteristics of the different layers in a CNN are frequently quite different, previous hardware designs have employed common optimization schemes for them. This paper proposes a layer-specific design that employs different organizations that are optimized for the different layers. The proposed design employs two layer-specific optimizations: layer-specific mixed data flow and layer-specific mixed precision. The mixed data flow aims to minimize the off-chip access while demanding a minimal on-chip memory (BRAM) resource of an FPGA device. The mixed precision quantization is to achieve both a lossless accuracy and an aggressive model compression, thereby further reducing the off-chip access. A Bayesian optimization approach is used to select the best sparsity for each layer, achieving the best trade-off between the accuracy and compression. This mixing scheme allows the entire network model to be stored in BRAMs of the FPGA to aggressively reduce the off-chip access, and thereby achieves a significant performance enhancement. The model size is reduced by 22.66-28.93 times compared to that in a full-precision network with a negligible degradation of accuracy on VOC, COCO, and ImageNet datasets. Furthermore, the combination of mixed dataflow and mixed precision significantly outperforms the previous works in terms of both throughput, off-chip access, and on-chip memory requirement.

read more

Citations
More filters
Journal ArticleDOI

Zero-Centered Fixed-Point Quantization With Iterative Retraining for Deep Convolutional Neural Network-Based Object Detectors

TL;DR: In this article, the center of the weight distribution is adjusted to zero by subtracting the mean of weight parameters before quantization, and the retraining process is iteratively applied to minimize the accuracy drop caused by quantization.
Journal ArticleDOI

Real-Time SSDLite Object Detection on FPGA

TL;DR: This article proposes an efficient computing system for real-time SSDLite object detection on FPGA devices, which includes novel hardware architecture and system optimization techniques.
Journal ArticleDOI

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

TL;DR: This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with the authors' new multi-objective evolutionary algorithm called NEMOKD, and with quantisation, and identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, trainingTime and accuracy.
Journal ArticleDOI

Resource-constrained FPGA implementation of YOLOv2

TL;DR: In this paper , a scalable cross-layer dataflow strategy is proposed which allows on-chip data transfer between different types of layers, and offers flexible off-Chip data transfer when the intermediate results are unaffordable onchip.
Proceedings ArticleDOI

An Adaptive Row-based Weight Reuse Scheme for FPGA Implementation of Convolutional Neural Networks

TL;DR: In this article, the row-based weight reuse scheme is used to reduce the input/output buffer size of U-Net by applying each level of row-reuse for each layer depending on its characteristic.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Related Papers (5)