scispace - formally typeset
Search or ask a question
Author

Abner Correa Barros

Bio: Abner Correa Barros is an academic researcher from Federal University of Pernambuco. The author has contributed to research in topics: Field-programmable gate array & General-purpose computing on graphics processing units. The author has an hindex of 2, co-authored 4 publications receiving 9 citations.

Papers
More filters
Book ChapterDOI
01 Jan 2013
TL;DR: This work presents a case study in the oil and gas industry, namely the FPGA implementation of the 2D reverse timing migration (RTM) seismic modeling algorithm, and suggests strategies such as reduced arithmetic precision, based on fixed-point numbers, and a highly parallel architecture are suggested.
Abstract: This work presents a case study in the oil and gas industry, namely the FPGA implementation of the 2D reverse timing migration (RTM) seismic modeling algorithm. These devices have been largely used as accelerators in scientific computing applications that require massive data processing, large parallel machines, huge memory bandwidth and power. The RTM algorithm enables you to directly solve the acoustic and elastic waves problems with precision in complex geological structures, demanding a high computational power. To face such challenges we suggest strategies such as reduced arithmetic precision, based on fixed-point numbers, and a highly parallel architecture are suggested. The effects of such reduced precision for storage/processing data are analyzed in this chapter through signal-noise ratio (SRN) and universal image quality index (UIQI) metrics. The results show that SRN higher than 50dB can be considered acceptable for a migrated image with 15 bits word size. A special stream-processing architecture aiming to implement the best possible data reuse for the algorithm is also presented. It was implemented by an FIFO-based cache in the internal memory of the FPGA. A temporal pipeline structure has also been developed, allowing that multiple time steps to be performed at the same time. The main advantage of this approach is the ability to keep the same memory bandwidth needs of processing just one time step. The number of time steps processed at the same time is limited by the amount of FPGA internal memory and logic blocks. The algorithm was implemented on an Altera Stratix 260E, with 16 processing elements (PEs). The FPGA was 29 times faster than the CPU and only 13% slower than the GPGPU. In terms of power consumption, the CPU+FPGA was 1.7 times more efficient than the GPGPU system.

3 citations

Proceedings ArticleDOI
26 Oct 2011
TL;DR: In this paper, a real case study was used in order to evaluate the efficiency of two different metrics applied to this seismic application based on RTM algorithm, and the main strategy is to explore the precision reduction in terms of SNR (Signal-to-Noise Ratio) and UIQI (Universal Image Quality Index) metrics, to improve the performance of the system.
Abstract: The recent increase in computing power of FPGAs has allowed its use in areas such as seismic data processing. Additionally, besides the capability of performing computations in parallel way, FPGAs also support application-specific number representations. In this type of application, in order to achieve better performance, instead of using the floating-point standard, usually the processing and storage of data is done using the fixed point standard. However, the change of representation can cause a degradation in the quality of the results. In the petroleum industry, a seismic image of poor quality can represent an erroneous interpretation of the subsurface, resulting in catastrophic losses. For this reason, it is essential that the quality of data obtained from the seismic data processing for low precision can be evaluated within reliable technical criteria. In this paper, a real case study was used in order to evaluate the efficiency of two different metrics applied to this seismic application based on RTM algorithm. The main strategy is to explore the precision reduction in terms of SNR (Signal-to-Noise Ratio) and UIQI (Universal Image Quality Index) metrics, in order to improve the performance of the system. Results show a performance gain of 50% compared with the architecture implemented in hardware using floating point standart IEE754.

3 citations

Proceedings ArticleDOI
01 Sep 2008
TL;DR: The architecture of an accumulative multiplier (MAC) in double precision floating-point, according to IEEE-754 standard is described and the architecture of a multiplier of matrices that uses developed instances of the MACs and explores the reuse of data through the use of the BRAMs of a Xilinx Virtex 4 LX200 FPGA.
Abstract: Recently, the manufactures of supercomputers have made use of FPGAs to accelerate scientific applications [16][17]. Traditionally, the FPGAs were used only on non-scientific applications. The main reasons for this fact are: the floating-point computation complexity; the FPGA logic cells are not sufficient for the scientific cores implementation; the cores complexity prevents them to operate on high frequencies.Nowadays, the increase of specialized blocks availability in complex operations, as sum and multiplier blocks, implemented directly in FPGA and, the increase of internal RAM blocks (BRAMs) have made possible high performance systems that use FPGA as a processing element for scientific computation [2].These devices are used as co-processors that execute intensive computation. The emphasis of these architectures is the exploration of parallelism present on scientific computation operations and data reuse.In major of these applications, the scientific computation uses, in general, operations of big floating-point dense matrices, which are normally operated by MACs.In this work, we describe the architecture of an accumulative multiplier (MAC) in double precision floating-point, according to IEEE-754 standard and we propose the architecture of a multiplier of matrices that uses developed instances of the MACs and explores the reuse of data through the use of the BRAMs (Blocks of RAM internal to the FPGAs) of a Xilinx Virtex 4 LX200 FPGA. The synthesis results showed that the implemented MAC could reach a performance of 4GFLOPs.

2 citations

Proceedings ArticleDOI
12 Nov 2011
TL;DR: A solution that takes advantage from FPGA's flexibility to explore efficiently data reuse, parallelization in both time and space domains for the first processing stage of the RTM (Reverse Time Migration) algorithm, the seismic modeling is presented.
Abstract: Hardware accelerators like GPGPUs and FPGAs have been used as an alternative for the conventional computing architectures (CPUs) in scientific computing applications and have shown considerable speed-ups. In this context, this poster presents a solution that takes advantage from FPGA's flexibility to explore efficiently data reuse, parallelization in both time and space domains for the first processing stage of the RTM (Reverse Time Migration) algorithm, the seismic modeling.In order to obtain a benchmark for our FPGA implementation, we also implemented the same algorithms for a CPU and GPGPU architecture. Our results showed that the FPGAs are a feasible platform for this set of applications. The experimental results have shown a 1,67x speed-up when compared to a Tesla C1060 GPGPU and a 25,79x speed-up when compared to an AMD Athlon 64 X2 CPU.

1 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This book is an excellent state-of-the-art review of RC and would be a worthwhile acquisition by anyone seriously considering speeding up a specific application.
Abstract: Reconfigurable Computing. Accelerating Computation with Field-Programmable Gate Arrays by Maya B. Gokhale and Paul S. Graham Springer, 2005, 238 pp. ISBN-13 978-0387-26105-8, $87.20 Reconfigurable Computing Accelerating Computation with Field-Programmable Gate Arrays is an expository and easy to digest book. The authors are recognized leaders with many years of experience on the field of reconfigurable computing. The book is written so that non-specialists can understand the principles, techniques and algorithms. Each chapter has many excellent references for interested readers. It surveys methods, algorithms, programming languages and applications targeted to reconfigurable computing. Automatic generation of parallel code from a sequential program on conventional micro-processor architectures remains an open problem. Nevertheless, a wide range of computationally intensive applications have benefited from many tools developed to tackle such a problem. For RC, it is even a much harder problem (perhaps 10x and up) and intense research is being devoted to make RC a common-place practical tool. The aim of the authors is threefold. First, guide the readers to know current issues on HLL for RC. Second, help the readers understand the intricate process of algorithmic-to-hardware compilation. And third, show that, even though this process is painful, if the application is suitable for RC the gains in performance are huge. The book is divided into two parts. The first part contains four chapters about reconfigurable computing and languages. Chapter 1 presents an introduction of RC, contrasting conventional fixed instruction microprocessors with RC architectures. This chapter also contains comprehensive reference material for further reading. Chapter 2 introduces reconfigurable logic devices by explaining the basic architecture and configuration of FPGAs. Chapter 3 deals with RC systems by discussing how parallel processing is achieved on reconfigurable computers and also gives a survey of RC systems today. Then, in chapter 4, languages, compilation, debugging and their related manual vs. automatic issues are discussed. The second part of the book comprises five chapters about applications of RC. Chapter 5 and 6 discuss digital signal and image processing applications. Chapter 7 covers the application of RC to secure network communications. The aim of Chapter 8 is to discuss some important bioinformatics applications for which RC is a good candidate, their algorithmic problems and hardware implementations. Finally, Chapter 9 covers two applications of reconfigurable supercomputers. The first one is a simulation of radiative heat transfer and the second one models large urban road traffic. This book is neither a technical nor a text book, but in the opinion of this reviewer, it is an excellent state-of-the-art review of RC and would be a worthwhile acquisition by anyone seriously considering speeding up a specific application. On the downside, it is somewhat disappointing that the book does not contain more information about HLL tools that could be used to help close the gap between traditional HPC community and the raw computing power of RC. Edusmildo Orozco, Department of Computer Science, University Of Puerto Rico at Rio Piedras.

105 citations

Journal ArticleDOI
TL;DR: Reverse time migration (RTM) as discussed by the authors is a seismic imaging method to map the subsurface reflectivity using recorded seismic waveforms, which is the only method that is capable to use all seismic wave types that can be computed numerically.

77 citations

Proceedings ArticleDOI
16 May 2011
TL;DR: Developing a PC cluster based on nodes that use FPGAs as co-processors for a floating-point large dense matrix multiplication, which shows performance improvements compared with the Intel Core2 Quad at 2.66 GHz.
Abstract: Field Programmable Gate Arrays (FPGAs) are able to provide a high computational parallelism that can be exploited to achieve high performance improvements in intensive data processing problems. In this paper our efforts were directed towards developing a PC cluster based on nodes that use FPGAs as co-processors. The target application is a floating-point large dense matrix multiplication. Experimental results for just one node of the cluster, consisting of a Xilinx Virtex 5 VLX50T with a PCI interface, showed performance improvements compared with the Intel Core2 Quad at 2.66 GHz, achieving a speed-up of 1.19 times. Other analyses in terms of frequency variation and power dissipation have been made by considering different matrix sizes running in one node of the cluster. Recently, the platform has been updated for a powerful Gidel plaftorm, the PROCe III 260E. This new platform consists of 1 FPGA Stratix III per board. In this board, it is possible to allocate up to 40 MACs per FPGA, reaching an overall speed-up of approximately 11.2 per node of the cluster when compared with the same general-purpose processor. A full example is presented in this paper.

11 citations

Journal ArticleDOI
TL;DR: In this paper, a stable solution for the isotropic elastic wave equation is obtained despite the very narrow dynamic range of the half-precision format, which is not obvious how the accuracy of the solution can be preserved with a very narrow 16-bit representation.
Abstract: New processors are increasingly supporting half-precision floating-point numbers, often with a significant throughput gain over single-precision operations. Seismic modeling, imaging, and inversion could benefit from such an acceleration, but it is not obvious how the accuracy of the solution can be preserved with a very narrow 16-bit representation. By scaling the finite-difference expression of the isotropic elastic wave equation, we have found that a stable solution can be obtained despite the very narrow dynamic range of the half-precision format.We develop an implementation with the CUDA platform, which, on most recent graphics processing units (GPU), is nearly twice as fast and uses half the memory of the equivalent single-precision version. The error on seismograms caused by the reduced precision is shown to correspond to a negligible fraction of the total seismic energy and is mostly incoherent with seismic phases. Finally, we find that this noise does not adversely impact full-waveform inversion nor reverse time migration, which both benefit from the higher throughput of half-precision computation.

7 citations

Book ChapterDOI
01 Jan 2013
TL;DR: This work presents a case study in the oil and gas industry, namely the FPGA implementation of the 2D reverse timing migration (RTM) seismic modeling algorithm, and suggests strategies such as reduced arithmetic precision, based on fixed-point numbers, and a highly parallel architecture are suggested.
Abstract: This work presents a case study in the oil and gas industry, namely the FPGA implementation of the 2D reverse timing migration (RTM) seismic modeling algorithm. These devices have been largely used as accelerators in scientific computing applications that require massive data processing, large parallel machines, huge memory bandwidth and power. The RTM algorithm enables you to directly solve the acoustic and elastic waves problems with precision in complex geological structures, demanding a high computational power. To face such challenges we suggest strategies such as reduced arithmetic precision, based on fixed-point numbers, and a highly parallel architecture are suggested. The effects of such reduced precision for storage/processing data are analyzed in this chapter through signal-noise ratio (SRN) and universal image quality index (UIQI) metrics. The results show that SRN higher than 50dB can be considered acceptable for a migrated image with 15 bits word size. A special stream-processing architecture aiming to implement the best possible data reuse for the algorithm is also presented. It was implemented by an FIFO-based cache in the internal memory of the FPGA. A temporal pipeline structure has also been developed, allowing that multiple time steps to be performed at the same time. The main advantage of this approach is the ability to keep the same memory bandwidth needs of processing just one time step. The number of time steps processed at the same time is limited by the amount of FPGA internal memory and logic blocks. The algorithm was implemented on an Altera Stratix 260E, with 16 processing elements (PEs). The FPGA was 29 times faster than the CPU and only 13% slower than the GPGPU. In terms of power consumption, the CPU+FPGA was 1.7 times more efficient than the GPGPU system.

3 citations