scispace - formally typeset
Search or ask a question
Author

Lucas M. Russo

Bio: Lucas M. Russo is an academic researcher from Federal University of São Carlos. The author has contributed to research in topics: CUDA & Verilog. The author has an hindex of 1, co-authored 1 publications receiving 35 citations.

Papers
More filters
Proceedings ArticleDOI
20 Mar 2012
TL;DR: In this article, convolution was implemented in each of the aforementioned architectures with the following languages: CUDA for GPUs and Verilog for FPGAs, and the same algorithms were also implemented in MATLAB, using predefined operations and in C using a regular x86 quad-core processor.
Abstract: Convolution is one of the most important operators used in image processing. With the constant need to increase the performance in high-end applications and the rise and popularity of parallel architectures, such as GPUs and the ones implemented in FPGAs, comes the necessity to compare these architectures in order to determine which of them performs better and in what scenario. In this article, convolution was implemented in each of the aforementioned architectures with the following languages: CUDA for GPUs and Verilog for FPGAs. In addition, the same algorithms were also implemented in MATLAB, using predefined operations and in C using a regular x86 quad-core processor. Comparative performance measures, considering the execution time and the clock ratio, were taken and commented in the paper. Overall, it was possible to achieve a CUDA speedup of roughly 200× in comparison to C, 70× in comparison to Matlab and 20× in comparison to FPGA.

36 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This literature survey presents a method of qualitative measurement that is widely used by researchers in the area of stereo vision disparity mappings and notes the implementation of previous software-based and hardware-based algorithms.
Abstract: This paper presents a literature survey on existing disparity map algorithms. It focuses on four main stages of processing as proposed by Scharstein and Szeliski in a taxonomy and evaluation of dense two-frame stereo correspondence algorithms performed in 2002. To assist future researchers in developing their own stereo matching algorithms, a summary of the existing algorithms developed for every stage of processing is also provided. The survey also notes the implementation of previous software-based and hardware-based algorithms. Generally, the main processing module for a software-based implementation uses only a central processing unit. By contrast, a hardware-based implementation requires one or more additional processors for its processing module, such as graphical processing unit or a field programmable gate array. This literature survey also presents a method of qualitative measurement that is widely used by researchers in the area of stereo vision disparity mappings.

212 citations

Journal ArticleDOI
TL;DR: This paper extends a user transparent parallel programming model for MMCA to allow the execution of compute intensive operations on the GPUs present in the cluster, and presents a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs.

49 citations

Proceedings ArticleDOI
28 Dec 2015
TL;DR: The purpose of this study is to present the FPGA resource usage for different sizes of Gaussian Kernel; to provide a comparison between fixed-point and floating point implementations; and to define the amount of bits are necessary to use in order to have a Root Mean Square Error below 5%.
Abstract: One of the very useful techniques in Image Processing is the 2D Gaussian Filter, especially when smoothing images. However, the implementation of a 2D Gaussian Filter requires heavy computational resources, and when it comes down to real-time applications, efficiency in the implementation is vital. Floating-point math represents an obstacle for this, as its implementation requires a large amount of computational power in order to achieve real-time image processing. On the other hand, a fixed-point approach is much more suitable; implementation of a 2D Gaussian Filter in FPGA using fixed-point arithmetic provides efficiency in the processing and reduction in computational costs. The purpose of this study is to present the FPGA resource usage for different sizes of Gaussian Kernel; to provide a comparison between fixed-point and floating point implementations; and to define the amount of bits are necessary to use in order to have a Root Mean Square Error (RMSE) below 5%.

45 citations

Proceedings ArticleDOI
19 Jul 2015
TL;DR: Results show that the rVEX softcore processor can achieve remarkably better performance compared to the industry-standard Xilinx MicroBlaze on image processing applications.
Abstract: The ever-increasing complexity of advanced high-resolution image processing applications requires innovative solutions to ensure addressing this issue efficiently and cost effectively. This paper discusses the utilization of reconfigurable general-purpose softcore processors in image processing applications such that hardware resources are efficiently utilized and at the same time ensure high image processing performance for the targeted application. Results show that the rVEX softcore processor can achieve remarkably better performance compared to the industry-standard Xilinx MicroBlaze (up to a factor of 3.2 times faster) on image processing applications.

21 citations

Dissertation
01 Jan 2013
TL;DR: The objective of this thesis is to compare the suitability of FPGAs, GPUs and DSPs for digital image processing applications, and an efficient FPGA implementation of direct normalized cross-correlation is created and compared against a GPU implementation from the OpenCV library.
Abstract: The objective of this thesis is to compare the suitability of FPGAs, GPUs and DSPs for digital image processing applications. Normalized cross-correlation is used as a benchmark, because this algorithm includes convolution, a common operation in image processing and elsewhere. Normalized cross-correlation is a template matching algorithm that is used to locate predefined objects in a scene image. Because the throughput of DSPs is low for efficient calculation of normalized cross-correlation, the focus is on FPGAs and GPUs. An efficient FPGA implementation of direct normalized cross-correlation is created and compared against a GPU implementation from the OpenCV library. Performance, cost, development time and power consumption are evaluated for the two platforms. The performance of the GPU implementation is slightly better than the FPGA implementation, and less time is spent developing a working solution. However, the power consumption of the GPU is higher. Both solutions are viable, so the most suitable platform will depend on the specific project requirements for image size, throughput, latency, power consumption, cost and development time.

21 citations