Proceedings ArticleDOI
An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations
B. Holanda,R. Pimentel,J. R. O. Barbosa,R. Camarotti,Abel G. Silva-Filho,L. Joao,V.L. Souza,J. M. G. Ferraz,Manoel Eusebio de Lima +8 more
- pp 306-309
TLDR
Developing a PC cluster based on nodes that use FPGAs as co-processors for a floating-point large dense matrix multiplication, which shows performance improvements compared with the Intel Core2 Quad at 2.66 GHz.Abstract:
Field Programmable Gate Arrays (FPGAs) are able to provide a high computational parallelism that can be exploited to achieve high performance improvements in intensive data processing problems. In this paper our efforts were directed towards developing a PC cluster based on nodes that use FPGAs as co-processors. The target application is a floating-point large dense matrix multiplication. Experimental results for just one node of the cluster, consisting of a Xilinx Virtex 5 VLX50T with a PCI interface, showed performance improvements compared with the Intel Core2 Quad at 2.66 GHz, achieving a speed-up of 1.19 times. Other analyses in terms of frequency variation and power dissipation have been made by considering different matrix sizes running in one node of the cluster. Recently, the platform has been updated for a powerful Gidel plaftorm, the PROCe III 260E. This new platform consists of 1 FPGA Stratix III per board. In this board, it is possible to allocate up to 40 MACs per FPGA, reaching an overall speed-up of approximately 11.2 per node of the cluster when compared with the same general-purpose processor. A full example is presented in this paper.read more
Citations
More filters
Journal ArticleDOI
Low-precision DSP-based floating-point multiply-add fused for field programmable gate arrays
TL;DR: This study proposes FP multiply-add fused units for low-precision formats which rely on modern Field Programmable Gate Array (FPGA) features such as the available integer multiply-accumulate-based support built-in the FPGA DSP blocks.
Journal ArticleDOI
Fast description and synthesis of control-dominant circuits
TL;DR: Applied to the design of a floating-point matrix multiplication hardware accelerator, the proposed methodology leads to similar computing performances than the dedicated designs reported in the literature but within shorter design times, simpler source code and no need for advanced hardware design skills.
Proceedings ArticleDOI
Synchronized-transfer-level design methodology applied to hardware matrix multiplication
TL;DR: Applied to the design of the pipelined matrix multiplication circuit, the proposed methodology leads to similar computing performances than the dedicated designs reported in the literature but within shorter design times, simpler source code and no need for advanced hardware design skills.
Journal ArticleDOI
The machine learning in the prediction of elections
TL;DR: An analysis and a comparison of three different algorithms, using two software of classification Weka and SALSA, as an aid for the prediction of future elections in the state of Quintana Roo, to demonstrate the efficiency of algorithms, with different data types.
Implementación y optimización del uso de DPS en FPGA en diseño de circuitos a medida para calcular determinantes de orden 4
TL;DR: In this article, the diseno e implementación of two circuitos digitales a medida for el calculo de determinante de matrices de orden 4, mediante el algoritmo del Teorema de Laplace, utilizando numeros enteros de 8 bits.
References
More filters
Journal ArticleDOI
SUMMA: Scalable Universal Matrix Multiplication Algorithm
TL;DR: This paper gives a straight forward, highly efficient, scalable implementation of common matrix multiplication operations that are much simpler than previously published methods, yield better performance, and require less work space.
Journal ArticleDOI
The density advantage of configurable computing
TL;DR: The author attempts to answer questions as to why FPGAs have been so much more successful than their microprocessor and DSP counterparts and how configurable computing fits into the arsenal of structures used to build general, programmable computing platforms.
Proceedings ArticleDOI
Computational Characteristics of Production Seismic Migration and its Performance on Novel Processor Architectures
Jairo Panetta,P.R.P. de Souza Filho,C.A. da Cunha Filho,F.M.R. da Motta,Silvio Sinedino Pinheiro,Ivan Pedrosa,Andre Luiz Romanelli Rosa,Luiz Monnerat,Leandro T. Carneiro,C.H.B. de Albrecht +9 more
TL;DR: The computational characteristics of the Kirchhoff prestack seismic migration currently used in daily production runs at Petrobras and its port to novel architectures and Port to the PS3 are described in detail.
Book ChapterDOI
An FPGA-Based parallel accelerator for matrix multiplications in the newton-raphson method
TL;DR: An FPGA-based Hierarchical-SIMD (H- SIMD) machine with its codesign of the Hierarchial Instruction Set Architecture (HISA) to speed up MM within each NR iteration to show sustained high performance.
Proceedings ArticleDOI
Architecture for dense matrix multiplication on a high-performance reconfigurable system
TL;DR: This work presents the analysis and development of an important scientific computing operation: matrix multiplication, targeting the commercial hybrid platform RASC (Reconfigurable Application-Specific Computing), developed by Silicon Graphics, and proposes a case study that uses the available resources in the target platform to explore these features.