scispace - formally typeset
Proceedings ArticleDOI

An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations

TLDR
Developing a PC cluster based on nodes that use FPGAs as co-processors for a floating-point large dense matrix multiplication, which shows performance improvements compared with the Intel Core2 Quad at 2.66 GHz.
Abstract
Field Programmable Gate Arrays (FPGAs) are able to provide a high computational parallelism that can be exploited to achieve high performance improvements in intensive data processing problems. In this paper our efforts were directed towards developing a PC cluster based on nodes that use FPGAs as co-processors. The target application is a floating-point large dense matrix multiplication. Experimental results for just one node of the cluster, consisting of a Xilinx Virtex 5 VLX50T with a PCI interface, showed performance improvements compared with the Intel Core2 Quad at 2.66 GHz, achieving a speed-up of 1.19 times. Other analyses in terms of frequency variation and power dissipation have been made by considering different matrix sizes running in one node of the cluster. Recently, the platform has been updated for a powerful Gidel plaftorm, the PROCe III 260E. This new platform consists of 1 FPGA Stratix III per board. In this board, it is possible to allocate up to 40 MACs per FPGA, reaching an overall speed-up of approximately 11.2 per node of the cluster when compared with the same general-purpose processor. A full example is presented in this paper.

read more

Citations
More filters
Journal ArticleDOI

Low-precision DSP-based floating-point multiply-add fused for field programmable gate arrays

TL;DR: This study proposes FP multiply-add fused units for low-precision formats which rely on modern Field Programmable Gate Array (FPGA) features such as the available integer multiply-accumulate-based support built-in the FPGA DSP blocks.
Journal ArticleDOI

Fast description and synthesis of control-dominant circuits

TL;DR: Applied to the design of a floating-point matrix multiplication hardware accelerator, the proposed methodology leads to similar computing performances than the dedicated designs reported in the literature but within shorter design times, simpler source code and no need for advanced hardware design skills.
Proceedings ArticleDOI

Synchronized-transfer-level design methodology applied to hardware matrix multiplication

TL;DR: Applied to the design of the pipelined matrix multiplication circuit, the proposed methodology leads to similar computing performances than the dedicated designs reported in the literature but within shorter design times, simpler source code and no need for advanced hardware design skills.
Journal ArticleDOI

The machine learning in the prediction of elections

TL;DR: An analysis and a comparison of three different algorithms, using two software of classification Weka and SALSA, as an aid for the prediction of future elections in the state of Quintana Roo, to demonstrate the efficiency of algorithms, with different data types.

Implementación y optimización del uso de DPS en FPGA en diseño de circuitos a medida para calcular determinantes de orden 4

TL;DR: In this article, the diseno e implementación of two circuitos digitales a medida for el calculo de determinante de matrices de orden 4, mediante el algoritmo del Teorema de Laplace, utilizando numeros enteros de 8 bits.
References
More filters
Journal ArticleDOI

SUMMA: Scalable Universal Matrix Multiplication Algorithm

TL;DR: This paper gives a straight forward, highly efficient, scalable implementation of common matrix multiplication operations that are much simpler than previously published methods, yield better performance, and require less work space.
Journal ArticleDOI

The density advantage of configurable computing

TL;DR: The author attempts to answer questions as to why FPGAs have been so much more successful than their microprocessor and DSP counterparts and how configurable computing fits into the arsenal of structures used to build general, programmable computing platforms.
Proceedings ArticleDOI

Computational Characteristics of Production Seismic Migration and its Performance on Novel Processor Architectures

TL;DR: The computational characteristics of the Kirchhoff prestack seismic migration currently used in daily production runs at Petrobras and its port to novel architectures and Port to the PS3 are described in detail.
Book ChapterDOI

An FPGA-Based parallel accelerator for matrix multiplications in the newton-raphson method

TL;DR: An FPGA-based Hierarchical-SIMD (H- SIMD) machine with its codesign of the Hierarchial Instruction Set Architecture (HISA) to speed up MM within each NR iteration to show sustained high performance.
Proceedings ArticleDOI

Architecture for dense matrix multiplication on a high-performance reconfigurable system

TL;DR: This work presents the analysis and development of an important scientific computing operation: matrix multiplication, targeting the commercial hybrid platform RASC (Reconfigurable Application-Specific Computing), developed by Silicon Graphics, and proposes a case study that uses the available resources in the target platform to explore these features.
Related Papers (5)