scispace - formally typeset
A

Andrew Boutros

Researcher at University of Toronto

Publications -  27
Citations -  402

Andrew Boutros is an academic researcher from University of Toronto. The author has contributed to research in topics: Field-programmable gate array & Computer science. The author has an hindex of 8, co-authored 20 publications receiving 193 citations. Previous affiliations of Andrew Boutros include Intel & Ruhr University Bochum.

Papers
More filters
Proceedings ArticleDOI

Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs

TL;DR: An enhanced DSP block is presented that can efficiently pack 2× as many 9-bit and 4x as many 4-bit multiplications compared to the baseline Arria-10-like D SP block at the cost of 12% block area overhead which leads to only 0.6% total FPGA core area increase.
Journal ArticleDOI

HW/SW Co-Design of the HOG algorithm on a Xilinx Zynq SoC

TL;DR: Three different implementations of the histogram of oriented gradients (HOG) algorithm using the Zynq SoC that consists of an ARM processor and an FPGA to achieve the highest performance for high resolution images are presented.
Proceedings ArticleDOI

Beyond Peak Performance: Comparing the Real Performance of AI-Optimized FPGAs and GPUs

TL;DR: In this paper, the authors evaluate the performance of Intel's AI-optimized 14nm FPGA, the Stratix 10 NX, with in-fabric AI tensor blocks that offer estimated peak performance up to 143 int8 TOPS.
Proceedings ArticleDOI

Why Compete When You Can Work Together: FPGA-ASIC Integration for Persistent RNNs

TL;DR: Overall, the study shows that the FPGA is better than the GPU for persistent DL, and when integrated with an ASIC chiplet, it can offer a more compelling solution.
Journal ArticleDOI

You Cannot Improve What You Do not Measure: FPGA vs. ASIC Efficiency Gaps for Convolutional Neural Network Inference

TL;DR: FPGA architectural changes such as increasing DSP block count, enhancing low-precision support in DSP blocks and rethinking the on-chip memories are suggested to reduce the programmability gap for DL applications.