scispace - formally typeset
Open AccessProceedings ArticleDOI

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

TLDR
In this article, the authors present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture, with fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements.
Abstract
Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 {\mu}s latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.

read more

Citations
More filters
Journal ArticleDOI

A Neuromorphic Chip Optimized for Deep Learning and CMOS Technology With Time-Domain Analog and Digital Mixed-Signal Processing

TL;DR: The time-domain neural network (TDNN), which employs time- domain analog and digital mixed-signal processing (TDAMS) that uses delay time as the analog signal, is proposed, which exploits energy-efficient analog computing, but also enables fully spatially unrolled architecture by the hardware-efficient feature of TDAMS.
Proceedings ArticleDOI

Scalable high-performance architecture for convolutional ternary neural networks on FPGA

TL;DR: This work presents a highly versatile FPGA friendly architecture for TNN in which it can vary both the number of bits of the input data and the level of parallelism at synthesis time, allowing to trade throughput for hardware resources and power consumption.
Proceedings ArticleDOI

FPGA-based CNN inference accelerator synthesized from multi-threaded C software

TL;DR: In this article, a deep learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads, using the LegUp high-level synthesis (HLS) tool synthesizing threads into parallel FPGA hardware, translating software parallelism into spatial parallelism.
Journal ArticleDOI

Memristor-Based Binarized Spiking Neural Networks: Challenges and applications

TL;DR: The challenges that face the memristor-based acceleration of NNs and how binarized SNNs (BSNNs) may offer a good fit for these emerging hardware systems are explored.
Journal ArticleDOI

An Always-On 3.8 $\mu$ J/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS

TL;DR: A mixed-signal binary convolutional neural network (CNN) processor for always-on inference applications that achieves 3.8 mW per classification, an improvement over the previous low-energy benchmark on CIFAR-10, achieved in part by sacrificing some programmability.
References
More filters
Posted Content

BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

TL;DR: BinaryNet, a method which trains DNNs with binary weights and activations when computing parameters’ gradient is introduced, which drastically reduces memory usage and replaces most multiplications by 1-bit exclusive-not-or (XNOR) operations, which might have a big impact on both general-purpose and dedicated Deep Learning hardware.
Proceedings Article

High Performance Convolutional Neural Networks for Document Processing

TL;DR: Three novel approaches to speeding up CNNs are presented: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic processing units).
Proceedings ArticleDOI

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

TL;DR: A HW accelerator optimized for BinaryConnect CNNs that achieves 1510 GOp/s on a core area of only 1.33 MGE and with a power dissipation of 153 mW in UMC 65 nm technology at 1.2 V is presented.
Related Papers (5)