Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance

doi:10.1109/FCCM.2004.21

Proceedings ArticleDOI

Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance

- pp 219-228

TLDR

The analysis highlights the amount of memory bandwidth and internal storage needed to sustain peak performance with FPGAs and considers the historical context of the last six years and is extrapolated for the next six years.

Abstract:

Field programmable gate arrays (FPGAs) have long been an attractive alternative to microprocessors for computing tasks - as long as floating-point arithmetic is not required. Fueled by the advance of Moore's law, FPGAs are rapidly reaching sufficient densities to enhance peak floating-point performance as well. The question, however, is how much of this peak performance can be sustained. This paper examines three of the basic linear algebra subroutine (BLAS) functions: vector dot product, matrix-vector multiply, and matrix multiply. A comparison of microprocessors, FPGAs, and reconfigurable computing platforms is performed for each operation. The analysis highlights the amount of memory bandwidth and internal storage needed to sustain peak performance with FPGAs. This analysis considers the historical context of the last six years and is extrapolated for the next six years.

Citations

PDF

Open Access

More filters

Book

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Scott Hauck, +1 more

TL;DR: This book is intended as an introduction to the entire range of issues important to reconfigurable computing, using FPGAs as the context, or "computing vehicles" to implement this powerful technology.

...read moreread less

Proceedings ArticleDOI

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Jeremy Fowers, +3 more

TL;DR: This paper analyzes an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores, and presents optimization strategies and use cases where each device is most effective.

...read moreread less

Proceedings ArticleDOI

Sparse Matrix-Vector multiplication on FPGAs

Ling Zhuo, +1 more

TL;DR: Besides solving SpMXV problem, the design provides a parameterized and flexible tree-based design for floating-point applications on FPGAs, which demonstrates significant speedup over general-purpose processors particularly for matrices with very irregular sparsity structure.

...read moreread less

Proceedings ArticleDOI

64-bit floating-point FPGA matrix multiplication

Yong Dou, +3 more

TL;DR: A 64-bit ANSI/IEEE Std 754-1985 floating point design of a hardware matrix multiplier optimized for FPGA implementations and implement a scalable linear array of processing elements (PE) supporting the proposed algorithm in the Xilinx Virtex II Pro technology.

...read moreread less

Journal ArticleDOI

Reconfigurable Computing Architectures

Russell Tessier, +2 more

TL;DR: This work surveys the field of reconfigurable computing, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

New trends in high performance computing

Osman Yasar, +3 more

TL;DR: The automatically tuned linear algebra software (ATLAS) project is described, as well as the fundamental principles that underly it, with the present emphasis on the basic linear algebra subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library.

...read moreread less

Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147)

Clint Whaley, +2 more

TL;DR: This paper describes the ATLAS (Automatically Tuned Linear Algebra Software) project, as well as the fundamental principles that underly it, with the present emphasis on the Basic Linear Al algebra Subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library.

...read moreread less

Proceedings ArticleDOI

FPGAs vs. CPUs: trends in peak floating-point performance

Keith D. Underwood

TL;DR: This paper examines the impact of Moore's Law on the peak floating-point performance of FPGAs and results show that peak FPGA floating- point performance is growing significantly faster than peak CPU performance for a CPU.

...read moreread less

Proceedings ArticleDOI

Quantitative analysis of floating point arithmetic on FPGA based custom computing machines

Nabeel Shirazi, +2 more

TL;DR: Using higher-level languages, like VHDL, facilitates the development of custom operators without significantly impacting operator performance or area, as well as properties, including area consumption and speed of working arithmetic operator units used in real-time applications.

...read moreread less

Book

The LINPACK benchmark: an explanation

Jack Dongarra

Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance

Citations

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Sparse Matrix-Vector multiplication on FPGAs

64-bit floating-point FPGA matrix multiplication

Reconfigurable Computing Architectures

References

New trends in high performance computing

Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147)

FPGAs vs. CPUs: trends in peak floating-point performance

Quantitative analysis of floating point arithmetic on FPGA based custom computing machines

The LINPACK benchmark: an explanation

Related Papers (5)

Sparse Matrix-Vector multiplication on FPGAs

64-bit floating-point FPGA matrix multiplication

FPGAs vs. CPUs: trends in peak floating-point performance

Floating-point sparse matrix-vector multiply for FPGAs

Analysis of high-performance floating-point arithmetic on FPGAs