FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

doi:10.1145/3316781.3317829

Proceedings ArticleDOI

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

Cong Hao, +7 more

- pp 1-6

Chats0

TLDR

Results show that the proposed DNN model and accelerator outperform the state-of-the-art FPGA designs in all aspects including Intersection-over-Union (IoU) and energy efficiency.

Abstract:

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment. In this paper, we propose a simultaneous FPGA/DNN co-design methodology with both bottom-up and top-down approaches: a bottom-up hardware-oriented DNN model search for high accuracy, and a top-down FPGA accelerator design considering DNN-specific characteristics. We also build an automatic co-design flow, including an Auto-DNN engine to perform hardware-oriented DNN model search, as well as an Auto-HLS engine to generate synthesizable C code of the FPGA accelerator for explored DNNs. We demonstrate our co-design approach on an object detection task using PYNQ-ZI FPGA. Results show that our proposed DNN model and accelerator outperform the state-of-the-art FPGA designs in all aspects including Intersection-over-Union (IoU) (6.2% higher), frames per second (FPS) (2.48\times higher), power consumption (40% lower), and energy efficiency (2.5\times higher). Compared to GPU-based solutions, our designs deliver similar accuracy but consume far less energy.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Hardware/Software Co-Exploration of Neural Architectures

Weiwen Jiang, +7 more

- 08 Apr 2020 -

IEEE Transactions on Computer-Aided Desi...

TL;DR: It is demonstrated that the co-exploration framework can effectively expand the search space to incorporate models with high accuracy, and theoretically show that the proposed two-level optimization can efficiently prune inferior solutions to better explore thesearch space.

...read moreread less

Proceedings ArticleDOI

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

Sheng-Chun Kao, +2 more

TL;DR: Con-fuciuX demonstrates the highest sample-efficiency for training compared to other techniques such as Bayesian optimization, genetic algorithm, simulated annealing, and other RL methods, and converges to the optimized hardware configuration 4.7 to 24 times faster than alternate techniques.

...read moreread less

Proceedings ArticleDOI

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions

Yuhong Li, +7 more

TL;DR: This work is the first to propose a fully simultaneous, Efficient Differentiable DNN (deep neural network) architecture and implementation co-search (EDD) methodology that achieves similar accuracy as the best existing DNN models searched by neural architecture search methods on ImageNet, but with superior performance obtained within 12 GPU-hour searches.

...read moreread less

Proceedings ArticleDOI

Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator

Mohamed S. Abdelfattah, +5 more

TL;DR: This paper defines the Codesign-NAS multiobjective optimization problem, demonstrating its effectiveness, and exploring different ways of navigating the codesign search space, to find the Pareto frontier within that large search space.

...read moreread less

Proceedings ArticleDOI

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

Lei Yang, +8 more

TL;DR: This paper builds ASIC template set based on existing successful designs, described by their unique dataflows, so that the design space is significantly reduced and proposes a framework, namely ASICNAS, which can simultaneously identify multiple DNN architectures and the associated heterogeneous ASIC accelerator design.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings ArticleDOI

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Mark Sandler, +4 more

TL;DR: MobileNetV2 as mentioned in this paper is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers and intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity.

...read moreread less

Posted Content

Neural Architecture Search with Reinforcement Learning

Barret Zoph, +1 more

- 05 Nov 2016 -

arXiv: Learning

TL;DR: This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.

...read moreread less

Proceedings ArticleDOI

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Chen Zhang, +5 more

TL;DR: This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.

...read moreread less

Posted Content

Learning Transferable Architectures for Scalable Image Recognition

Barret Zoph, +3 more

- 21 Jul 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset and introduces a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models.

...read moreread less

Related Papers (5)

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

Yu-Hsin Chen, +3 more

- 01 Jan 2017 -

IEEE Journal of Solid-state Circuits