scispace - formally typeset
Proceedings ArticleDOI

Image convolution processing: A GPU versus FPGA comparison

Reads0
Chats0
TLDR
In this article, convolution was implemented in each of the aforementioned architectures with the following languages: CUDA for GPUs and Verilog for FPGAs, and the same algorithms were also implemented in MATLAB, using predefined operations and in C using a regular x86 quad-core processor.
Abstract
Convolution is one of the most important operators used in image processing. With the constant need to increase the performance in high-end applications and the rise and popularity of parallel architectures, such as GPUs and the ones implemented in FPGAs, comes the necessity to compare these architectures in order to determine which of them performs better and in what scenario. In this article, convolution was implemented in each of the aforementioned architectures with the following languages: CUDA for GPUs and Verilog for FPGAs. In addition, the same algorithms were also implemented in MATLAB, using predefined operations and in C using a regular x86 quad-core processor. Comparative performance measures, considering the execution time and the clock ratio, were taken and commented in the paper. Overall, it was possible to achieve a CUDA speedup of roughly 200× in comparison to C, 70× in comparison to Matlab and 20× in comparison to FPGA.

read more

Citations
More filters
Journal ArticleDOI

Literature Survey on Stereo Vision Disparity Map Algorithms

TL;DR: This literature survey presents a method of qualitative measurement that is widely used by researchers in the area of stereo vision disparity mappings and notes the implementation of previous software-based and hardware-based algorithms.
Journal ArticleDOI

Optimizing convolution operations on GPUs using adaptive tiling

TL;DR: This paper extends a user transparent parallel programming model for MMCA to allow the execution of compute intensive operations on the GPUs present in the cluster, and presents a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs.
Proceedings ArticleDOI

Implementation of a fixed-point 2D Gaussian Filter for Image Processing based on FPGA

TL;DR: The purpose of this study is to present the FPGA resource usage for different sizes of Gaussian Kernel; to provide a comparison between fixed-point and floating point implementations; and to define the amount of bits are necessary to use in order to have a Root Mean Square Error below 5%.
Proceedings ArticleDOI

Using VLIW softcore processors for image processing applications

TL;DR: Results show that the rVEX softcore processor can achieve remarkably better performance compared to the industry-standard Xilinx MicroBlaze on image processing applications.
Dissertation

Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems

Egil Fykse
TL;DR: The objective of this thesis is to compare the suitability of FPGAs, GPUs and DSPs for digital image processing applications, and an efficient FPGA implementation of direct normalized cross-correlation is created and compared against a GPU implementation from the OpenCV library.
References
More filters
Proceedings ArticleDOI

Accelerating Compute-Intensive Applications with GPUs and FPGAs

TL;DR: A comparative study of application behavior on accelerators considering performance and code complexity and an application characteristic to accelerator platform mapping are presented, which can aid developers in selecting an appropriate target architecture for their chosen application.
Proceedings ArticleDOI

Performance comparison of FPGA, GPU and CPU in image processing

TL;DR: This paper compares the performance of FPGA, GPU and CPU using three applications in image processing; two-dimensional filters, stereo-vision and k-means clustering, and makes it clear which platform is faster under which conditions.
Proceedings ArticleDOI

BLAS Comparison on FPGA, CPU and GPU

TL;DR: A high-throughput accumulator is designed to perform an efficient reduction of floating point values in order to obtain optimal performance for any aspect ratio of the matrices and target the BEE3 FPGA platform.
Journal ArticleDOI

Comparing Hardware Accelerators in Scientific Applications: A Case Study

TL;DR: It is shown that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost and it is illustrated that graphics accelerators can make simulations involving large numbers of particles feasible.
Related Papers (5)