Proceedings ArticleDOI
Image convolution processing: A GPU versus FPGA comparison
Lucas M. Russo,Emerson Carlos Pedrino,E.R.R. Kato,Valentin Obac Roda +3 more
- pp 1-6
Reads0
Chats0
TLDR
In this article, convolution was implemented in each of the aforementioned architectures with the following languages: CUDA for GPUs and Verilog for FPGAs, and the same algorithms were also implemented in MATLAB, using predefined operations and in C using a regular x86 quad-core processor.Abstract:
Convolution is one of the most important operators used in image processing. With the constant need to increase the performance in high-end applications and the rise and popularity of parallel architectures, such as GPUs and the ones implemented in FPGAs, comes the necessity to compare these architectures in order to determine which of them performs better and in what scenario. In this article, convolution was implemented in each of the aforementioned architectures with the following languages: CUDA for GPUs and Verilog for FPGAs. In addition, the same algorithms were also implemented in MATLAB, using predefined operations and in C using a regular x86 quad-core processor. Comparative performance measures, considering the execution time and the clock ratio, were taken and commented in the paper. Overall, it was possible to achieve a CUDA speedup of roughly 200× in comparison to C, 70× in comparison to Matlab and 20× in comparison to FPGA.read more
Citations
More filters
Journal ArticleDOI
Literature Survey on Stereo Vision Disparity Map Algorithms
TL;DR: This literature survey presents a method of qualitative measurement that is widely used by researchers in the area of stereo vision disparity mappings and notes the implementation of previous software-based and hardware-based algorithms.
Journal ArticleDOI
Optimizing convolution operations on GPUs using adaptive tiling
TL;DR: This paper extends a user transparent parallel programming model for MMCA to allow the execution of compute intensive operations on the GPUs present in the cluster, and presents a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs.
Proceedings ArticleDOI
Implementation of a fixed-point 2D Gaussian Filter for Image Processing based on FPGA
TL;DR: The purpose of this study is to present the FPGA resource usage for different sizes of Gaussian Kernel; to provide a comparison between fixed-point and floating point implementations; and to define the amount of bits are necessary to use in order to have a Root Mean Square Error below 5%.
Proceedings ArticleDOI
Using VLIW softcore processors for image processing applications
TL;DR: Results show that the rVEX softcore processor can achieve remarkably better performance compared to the industry-standard Xilinx MicroBlaze on image processing applications.
Dissertation
Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems
TL;DR: The objective of this thesis is to compare the suitability of FPGAs, GPUs and DSPs for digital image processing applications, and an efficient FPGA implementation of direct normalized cross-correlation is created and compared against a GPU implementation from the OpenCV library.
References
More filters
Proceedings ArticleDOI
Accelerating Compute-Intensive Applications with GPUs and FPGAs
TL;DR: A comparative study of application behavior on accelerators considering performance and code complexity and an application characteristic to accelerator platform mapping are presented, which can aid developers in selecting an appropriate target architecture for their chosen application.
Proceedings ArticleDOI
Performance comparison of FPGA, GPU and CPU in image processing
TL;DR: This paper compares the performance of FPGA, GPU and CPU using three applications in image processing; two-dimensional filters, stereo-vision and k-means clustering, and makes it clear which platform is faster under which conditions.
Proceedings ArticleDOI
BLAS Comparison on FPGA, CPU and GPU
TL;DR: A high-throughput accumulator is designed to perform an efficient reduction of floating point values in order to obtain optimal performance for any aspect ratio of the matrices and target the BEE3 FPGA platform.
Journal ArticleDOI
Comparing Hardware Accelerators in Scientific Applications: A Case Study
TL;DR: It is shown that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost and it is illustrated that graphics accelerators can make simulations involving large numbers of particles feasible.