scispace - formally typeset
Open AccessProceedings ArticleDOI

Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels

Reads0
Chats0
TLDR
A comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels is conducted and rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of arange of vision kernel categories are discussed.
Abstract
Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1–3.2× compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2–22.3×. It is also observed that the FPGA performs increasingly better as a vision application's pipeline complexity grows.

read more

Citations
More filters
Peer Review

UAV in the advent of the twenties: Where we stand and what is next

TL;DR: In this paper , the authors review best practices for the use of UAVs for remote sensing and mapping applications and report on current trends for UAV use and discuss their future impact in photogrammetry and remote sensing.
Journal ArticleDOI

Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA

TL;DR: A high efficient layer-wise refined pruning method for deep neural networks at the software level and accelerates the inference process at the hardware level on a field-programmable gate array (FPGA).
Journal Article

Treehouse: A Case For Carbon-Aware Datacenter Software

TL;DR: It is argued that substantial reductions in the carbon intensity of datacenter computing are possible with a software-centric approach: by making energy and carbon visible to application developers on a fine-grained basis, by modifying system APIs to make it possible to make informed trade offs between performance and carbon emissions.
Proceedings ArticleDOI

ReconROS: Flexible Hardware Acceleration for ROS2 Applications

TL;DR: In this paper, the authors present ReconROS, a framework that integrates the widely-used robot operating system (ROS) with ReconOS, which features multithreaded programming of hardware and software threads for reconfigurable computers.
Journal ArticleDOI

Field Trial of a Flexible Real-Time Software-Defined GPU-Based Optical Receiver

TL;DR: In this paper, a software-defined real-time multi-modulation format receiver implemented on an off-the-shelf general-purpose graphics processing unit (GPU) is presented.
References
More filters
Proceedings ArticleDOI

Rodinia: A benchmark suite for heterogeneous computing

TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.
Proceedings ArticleDOI

Accelerating Compute-Intensive Applications with GPUs and FPGAs

TL;DR: A comparative study of application behavior on accelerators considering performance and code complexity and an application characteristic to accelerator platform mapping are presented, which can aid developers in selecting an appropriate target architecture for their chosen application.
Proceedings ArticleDOI

Scaling, power, and the future of CMOS

TL;DR: In this article, the authors briefly review the forces that caused the power problem, the solutions that were applied, and what the solutions tell us about the problem as systems became more power constrained, optimizing the power became more critical.
Proceedings ArticleDOI

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

TL;DR: This paper analyzes an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores, and presents optimization strategies and use cases where each device is most effective.
Related Papers (5)