FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

doi:10.1145/3020078.3021744

Open AccessProceedings ArticleDOI

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Yaman Umuroglu, +6 more

- 01 Dec 2016 -

arXiv: Computer Vision and Pattern Recog...

TLDR

In this article, the authors present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture, with fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements.

Abstract:

Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 {\mu}s latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Citations

A Neuromorphic Chip Optimized for Deep Learning and CMOS Technology With Time-Domain Analog and Digital Mixed-Signal Processing

Scalable high-performance architecture for convolutional ternary neural networks on FPGA

FPGA-based CNN inference accelerator synthesized from multi-threaded C software

Memristor-Based Binarized Spiking Neural Networks: Challenges and applications

An Always-On 3.8 $\mu$ J/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS

References

BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

High Performance Convolutional Neural Networks for Document Processing

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

Related Papers (5)

ImageNet Classification with Deep Convolutional Neural Networks

BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Fast Algorithms for Convolutional Neural Networks

Efficient Convolutional Neural Networks for Pixelwise Classification on Heterogeneous Hardware Systems