Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels
Murad Qasaimeh,Kristof Denolf,Jack Lo,Kees Vissers,Joseph Zambreno,Phillip H. Jones +5 more
- pp 1-8
Reads0
Chats0
TLDR
A comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels is conducted and rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of arange of vision kernel categories are discussed.Abstract:
Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1–3.2× compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2–22.3×. It is also observed that the FPGA performs increasingly better as a vision application's pipeline complexity grows.read more
Citations
More filters
Peer Review
UAV in the advent of the twenties: Where we stand and what is next
Francesco Nex,Costas Armenakis,Michael Cramer,Davide Antonio Cucci,Merrill D. Gerke,Eija Honkavaara,Antero Kukko,Claudio Persello,Jan Skaloud +8 more
TL;DR: In this paper , the authors review best practices for the use of UAVs for remote sensing and mapping applications and report on current trends for UAV use and discuss their future impact in photogrammetry and remote sensing.
Journal ArticleDOI
Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA
TL;DR: A high efficient layer-wise refined pruning method for deep neural networks at the software level and accelerates the inference process at the hardware level on a field-programmable gate array (FPGA).
Journal Article
Treehouse: A Case For Carbon-Aware Datacenter Software
TL;DR: It is argued that substantial reductions in the carbon intensity of datacenter computing are possible with a software-centric approach: by making energy and carbon visible to application developers on a fine-grained basis, by modifying system APIs to make it possible to make informed trade offs between performance and carbon emissions.
Proceedings ArticleDOI
ReconROS: Flexible Hardware Acceleration for ROS2 Applications
TL;DR: In this paper, the authors present ReconROS, a framework that integrates the widely-used robot operating system (ROS) with ReconOS, which features multithreaded programming of hardware and software threads for reconfigurable computers.
Journal ArticleDOI
Field Trial of a Flexible Real-Time Software-Defined GPU-Based Optical Receiver
Sjoerd van der Heide,Ruben S. Luis,Benjamin J. Puttnam,Georg Rademacher,Ton Koonen,Satoshi Shinada,Yohinari Awaji,Hideaki Furukawa,Chigo Okonkwo +8 more
TL;DR: In this paper, a software-defined real-time multi-modulation format receiver implemented on an off-the-shelf general-purpose graphics processing unit (GPU) is presented.
References
More filters
Proceedings ArticleDOI
Rodinia: A benchmark suite for heterogeneous computing
Shuai Che,Michael Boyer,Jiayuan Meng,David Tarjan,Jeremy W. Sheaffer,Sang-Ha Lee,Kevin Skadron +6 more
TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.
Proceedings ArticleDOI
Accelerating Compute-Intensive Applications with GPUs and FPGAs
TL;DR: A comparative study of application behavior on accelerators considering performance and code complexity and an application characteristic to accelerator platform mapping are presented, which can aid developers in selecting an appropriate target architecture for their chosen application.
Proceedings ArticleDOI
Scaling, power, and the future of CMOS
TL;DR: In this article, the authors briefly review the forces that caused the power problem, the solutions that were applied, and what the solutions tell us about the problem as systems became more power constrained, optimizing the power became more critical.
Proceedings ArticleDOI
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
TL;DR: This paper analyzes an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores, and presents optimization strategies and use cases where each device is most effective.