An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations

doi:10.1109/IPDPS.2011.165

Proceedings ArticleDOI

An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations

- pp 306-309

TLDR

Developing a PC cluster based on nodes that use FPGAs as co-processors for a floating-point large dense matrix multiplication, which shows performance improvements compared with the Intel Core2 Quad at 2.66 GHz.

Abstract:

Field Programmable Gate Arrays (FPGAs) are able to provide a high computational parallelism that can be exploited to achieve high performance improvements in intensive data processing problems. In this paper our efforts were directed towards developing a PC cluster based on nodes that use FPGAs as co-processors. The target application is a floating-point large dense matrix multiplication. Experimental results for just one node of the cluster, consisting of a Xilinx Virtex 5 VLX50T with a PCI interface, showed performance improvements compared with the Intel Core2 Quad at 2.66 GHz, achieving a speed-up of 1.19 times. Other analyses in terms of frequency variation and power dissipation have been made by considering different matrix sizes running in one node of the cluster. Recently, the platform has been updated for a powerful Gidel plaftorm, the PROCe III 260E. This new platform consists of 1 FPGA Stratix III per board. In this board, it is possible to allocate up to 40 MACs per FPGA, reaching an overall speed-up of approximately 11.2 per node of the cluster when compared with the same general-purpose processor. A full example is presented in this paper.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Low-precision DSP-based floating-point multiply-add fused for field programmable gate arrays

Alexandru Amaricai, +2 more

- 14 Jul 2014 -

Iet Computers and Digital Techniques

TL;DR: This study proposes FP multiply-add fused units for low-precision formats which rely on modern Field Programmable Gate Array (FPGA) features such as the available integer multiply-accumulate-based support built-in the FPGA DSP blocks.

...read moreread less

Journal ArticleDOI

Fast description and synthesis of control-dominant circuits

Marc-Andre Daigneault, +1 more

- 01 May 2014 -

Computers & Electrical Engineering

TL;DR: Applied to the design of a floating-point matrix multiplication hardware accelerator, the proposed methodology leads to similar computing performances than the dedicated designs reported in the literature but within shorter design times, simpler source code and no need for advanced hardware design skills.

...read moreread less

Proceedings ArticleDOI

Synchronized-transfer-level design methodology applied to hardware matrix multiplication

Marc-Andre Daigneault, +1 more

TL;DR: Applied to the design of the pipelined matrix multiplication circuit, the proposed methodology leads to similar computing performances than the dedicated designs reported in the literature but within shorter design times, simpler source code and no need for advanced hardware design skills.

...read moreread less

Journal ArticleDOI

The machine learning in the prediction of elections

José A. León-Borges, +3 more

TL;DR: An analysis and a comparison of three different algorithms, using two software of classification Weka and SALSA, as an aid for the prediction of future elections in the state of Quintana Roo, to demonstrate the efficiency of algorithms, with different data types.

...read moreread less

Implementación y optimización del uso de DPS en FPGA en diseño de circuitos a medida para calcular determinantes de orden 4

Francisco Plascencia Jauregui, +3 more

TL;DR: In this article, the diseno e implementación of two circuitos digitales a medida for el calculo de determinante de matrices de orden 4, mediante el algoritmo del Teorema de Laplace, utilizando numeros enteros de 8 bits.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

SUMMA: Scalable Universal Matrix Multiplication Algorithm

Robert A. van de Geijn, +1 more

- 01 Apr 1995 -

Concurrency and Computation: Practice an...

TL;DR: This paper gives a straight forward, highly efficient, scalable implementation of common matrix multiplication operations that are much simpler than previously published methods, yield better performance, and require less work space.

...read moreread less

Journal ArticleDOI

The density advantage of configurable computing

André DeHon

- 01 Apr 2000 -

IEEE Computer

TL;DR: The author attempts to answer questions as to why FPGAs have been so much more successful than their microprocessor and DSP counterparts and how configurable computing fits into the arsenal of structures used to build general, programmable computing platforms.

...read moreread less

Proceedings ArticleDOI

Computational Characteristics of Production Seismic Migration and its Performance on Novel Processor Architectures

Jairo Panetta, +9 more

TL;DR: The computational characteristics of the Kirchhoff prestack seismic migration currently used in daily production runs at Petrobras and its port to novel architectures and Port to the PS3 are described in detail.

...read moreread less

Book ChapterDOI

An FPGA-Based parallel accelerator for matrix multiplications in the newton-raphson method

Xizhen Xu, +2 more

TL;DR: An FPGA-based Hierarchical-SIMD (H- SIMD) machine with its codesign of the Hierarchial Instruction Set Architecture (HISA) to speed up MM within each NR iteration to show sustained high performance.

...read moreread less

Proceedings ArticleDOI

Architecture for dense matrix multiplication on a high-performance reconfigurable system

Viviane Lucy Santos Souza, +2 more

TL;DR: This work presents the analysis and development of an important scientific computing operation: matrix multiplication, targeting the commercial hybrid platform RASC (Reconfigurable Application-Specific Computing), developed by Silicon Graphics, and proposes a case study that uses the available resources in the target platform to explore these features.

...read moreread less

Related Papers (5)

FPGA accelerator for floating-point matrix multiplication

Zeljko Jovanovic, +1 more

- 25 Oct 2012 -

Iet Computers and Digital Techniques

IEEE Transactions on Computers

FPGA Based Acceleration of the Linpack Benchmark: A High Level Code Transformation Approach

K. Turkington, +3 more

An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations

Citations

Low-precision DSP-based floating-point multiply-add fused for field programmable gate arrays

Fast description and synthesis of control-dominant circuits

Synchronized-transfer-level design methodology applied to hardware matrix multiplication

The machine learning in the prediction of elections

Implementación y optimización del uso de DPS en FPGA en diseño de circuitos a medida para calcular determinantes de orden 4

References

SUMMA: Scalable Universal Matrix Multiplication Algorithm

The density advantage of configurable computing

Computational Characteristics of Production Seismic Migration and its Performance on Novel Processor Architectures

An FPGA-Based parallel accelerator for matrix multiplications in the newton-raphson method

Architecture for dense matrix multiplication on a high-performance reconfigurable system

Related Papers (5)

FPGA accelerator for floating-point matrix multiplication

Designing scalable FPGA-based reduction circuits using pipelined floating-point cores

Spectral Method Characterization on FPGA and GPU Accelerators

A High Performance and Memory Efficient LU Decomposer on FPGAs

FPGA Based Acceleration of the Linpack Benchmark: A High Level Code Transformation Approach