scispace - formally typeset
Proceedings ArticleDOI

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

Reads0
Chats0
TLDR
Results show that the proposed DNN model and accelerator outperform the state-of-the-art FPGA designs in all aspects including Intersection-over-Union (IoU) and energy efficiency.
Abstract
While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment. In this paper, we propose a simultaneous FPGA/DNN co-design methodology with both bottom-up and top-down approaches: a bottom-up hardware-oriented DNN model search for high accuracy, and a top-down FPGA accelerator design considering DNN-specific characteristics. We also build an automatic co-design flow, including an Auto-DNN engine to perform hardware-oriented DNN model search, as well as an Auto-HLS engine to generate synthesizable C code of the FPGA accelerator for explored DNNs. We demonstrate our co-design approach on an object detection task using PYNQ-ZI FPGA. Results show that our proposed DNN model and accelerator outperform the state-of-the-art FPGA designs in all aspects including Intersection-over-Union (IoU) (6.2% higher), frames per second (FPS) (2.48\times higher), power consumption (40% lower), and energy efficiency (2.5\times higher). Compared to GPU-based solutions, our designs deliver similar accuracy but consume far less energy.

read more

Citations
More filters
Journal ArticleDOI

Hardware/Software Co-Exploration of Neural Architectures

TL;DR: It is demonstrated that the co-exploration framework can effectively expand the search space to incorporate models with high accuracy, and theoretically show that the proposed two-level optimization can efficiently prune inferior solutions to better explore thesearch space.
Proceedings ArticleDOI

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

TL;DR: Con-fuciuX demonstrates the highest sample-efficiency for training compared to other techniques such as Bayesian optimization, genetic algorithm, simulated annealing, and other RL methods, and converges to the optimized hardware configuration 4.7 to 24 times faster than alternate techniques.
Proceedings ArticleDOI

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions

TL;DR: This work is the first to propose a fully simultaneous, Efficient Differentiable DNN (deep neural network) architecture and implementation co-search (EDD) methodology that achieves similar accuracy as the best existing DNN models searched by neural architecture search methods on ImageNet, but with superior performance obtained within 12 GPU-hour searches.
Proceedings ArticleDOI

Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator

TL;DR: This paper defines the Codesign-NAS multiobjective optimization problem, demonstrating its effectiveness, and exploring different ways of navigating the codesign search space, to find the Pareto frontier within that large search space.
Proceedings ArticleDOI

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

TL;DR: This paper builds ASIC template set based on existing successful designs, described by their unique dataflows, so that the design space is significantly reduced and proposes a framework, namely ASICNAS, which can simultaneously identify multiple DNN architectures and the associated heterogeneous ASIC accelerator design.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings ArticleDOI

MobileNetV2: Inverted Residuals and Linear Bottlenecks

TL;DR: MobileNetV2 as mentioned in this paper is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers and intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity.
Posted Content

Neural Architecture Search with Reinforcement Learning

Barret Zoph, +1 more
- 05 Nov 2016 - 
TL;DR: This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
Proceedings ArticleDOI

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

TL;DR: This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.
Posted Content

Learning Transferable Architectures for Scalable Image Recognition

TL;DR: This paper proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset and introduces a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models.
Related Papers (5)