Image convolution processing: A GPU versus FPGA comparison

doi:10.1109/SPL.2012.6211783

Proceedings ArticleDOI

Image convolution processing: A GPU versus FPGA comparison

Lucas M. Russo, +3 more

- pp 1-6

Chats0

TLDR

In this article, convolution was implemented in each of the aforementioned architectures with the following languages: CUDA for GPUs and Verilog for FPGAs, and the same algorithms were also implemented in MATLAB, using predefined operations and in C using a regular x86 quad-core processor.

Abstract:

Convolution is one of the most important operators used in image processing. With the constant need to increase the performance in high-end applications and the rise and popularity of parallel architectures, such as GPUs and the ones implemented in FPGAs, comes the necessity to compare these architectures in order to determine which of them performs better and in what scenario. In this article, convolution was implemented in each of the aforementioned architectures with the following languages: CUDA for GPUs and Verilog for FPGAs. In addition, the same algorithms were also implemented in MATLAB, using predefined operations and in C using a regular x86 quad-core processor. Comparative performance measures, considering the execution time and the clock ratio, were taken and commented in the paper. Overall, it was possible to achieve a CUDA speedup of roughly 200× in comparison to C, 70× in comparison to Matlab and 20× in comparison to FPGA.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Literature Survey on Stereo Vision Disparity Map Algorithms

Rostam Affendi Hamzah, +1 more

- 01 Jan 2016 -

Journal of Sensors

TL;DR: This literature survey presents a method of qualitative measurement that is widely used by researchers in the area of stereo vision disparity mappings and notes the implementation of previous software-based and hardware-based algorithms.

...read moreread less

Journal ArticleDOI

Optimizing convolution operations on GPUs using adaptive tiling

Ben van Werkhoven, +3 more

- 01 Jan 2014 -

Future Generation Computer Systems

TL;DR: This paper extends a user transparent parallel programming model for MMCA to allow the execution of compute intensive operations on the GPUs present in the cluster, and presents a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs.

...read moreread less

Proceedings ArticleDOI

Implementation of a fixed-point 2D Gaussian Filter for Image Processing based on FPGA

Frank C. Cabello, +3 more

TL;DR: The purpose of this study is to present the FPGA resource usage for different sizes of Gaussian Kernel; to provide a comparison between fixed-point and floating point implementations; and to define the amount of bits are necessary to use in order to have a Root Mean Square Error below 5%.

...read moreread less

Proceedings ArticleDOI

Using VLIW softcore processors for image processing applications

Joost Hoozemans, +2 more

TL;DR: Results show that the rVEX softcore processor can achieve remarkably better performance compared to the industry-standard Xilinx MicroBlaze on image processing applications.

...read moreread less

Dissertation

Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems

Egil Fykse

TL;DR: The objective of this thesis is to compare the suitability of FPGAs, GPUs and DSPs for digital image processing applications, and an efficient FPGA implementation of direct normalized cross-correlation is created and compared against a GPU implementation from the OpenCV library.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Accelerating Compute-Intensive Applications with GPUs and FPGAs

Shuai Che, +4 more

TL;DR: A comparative study of application behavior on accelerators considering performance and code complexity and an application characteristic to accelerator platform mapping are presented, which can aid developers in selecting an appropriate target architecture for their chosen application.

...read moreread less

Proceedings ArticleDOI

Performance comparison of FPGA, GPU and CPU in image processing

Shuichi Asano, +2 more

TL;DR: This paper compares the performance of FPGA, GPU and CPU using three applications in image processing; two-dimensional filters, stereo-vision and k-means clustering, and makes it clear which platform is faster under which conditions.

...read moreread less

Proceedings ArticleDOI

BLAS Comparison on FPGA, CPU and GPU

Srinidhi Kestur, +2 more

TL;DR: A high-throughput accumulator is designed to perform an efficient reduction of floating point values in order to obtain optimal performance for any aspect ratio of the matrices and target the BEE3 FPGA platform.

...read moreread less

Journal ArticleDOI

Comparing Hardware Accelerators in Scientific Applications: A Case Study

Rick Weber, +3 more

- 01 Jan 2011 -

IEEE Transactions on Parallel and Distri...

TL;DR: It is shown that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost and it is illustrated that graphics accelerators can make simulations involving large numbers of particles feasible.

...read moreread less

Book ChapterDOI