Proceedings ArticleDOI
Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance
Keith D. Underwood,Karl Scott Hemmert +1 more
- pp 219-228
TLDR
The analysis highlights the amount of memory bandwidth and internal storage needed to sustain peak performance with FPGAs and considers the historical context of the last six years and is extrapolated for the next six years.Abstract:
Field programmable gate arrays (FPGAs) have long been an attractive alternative to microprocessors for computing tasks - as long as floating-point arithmetic is not required. Fueled by the advance of Moore's law, FPGAs are rapidly reaching sufficient densities to enhance peak floating-point performance as well. The question, however, is how much of this peak performance can be sustained. This paper examines three of the basic linear algebra subroutine (BLAS) functions: vector dot product, matrix-vector multiply, and matrix multiply. A comparison of microprocessors, FPGAs, and reconfigurable computing platforms is performed for each operation. The analysis highlights the amount of memory bandwidth and internal storage needed to sustain peak performance with FPGAs. This analysis considers the historical context of the last six years and is extrapolated for the next six years.read more
Citations
More filters
Book
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Scott Hauck,André DeHon +1 more
TL;DR: This book is intended as an introduction to the entire range of issues important to reconfigurable computing, using FPGAs as the context, or "computing vehicles" to implement this powerful technology.
Proceedings ArticleDOI
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
TL;DR: This paper analyzes an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores, and presents optimization strategies and use cases where each device is most effective.
Proceedings ArticleDOI
Sparse Matrix-Vector multiplication on FPGAs
Ling Zhuo,Viktor K. Prasanna +1 more
TL;DR: Besides solving SpMXV problem, the design provides a parameterized and flexible tree-based design for floating-point applications on FPGAs, which demonstrates significant speedup over general-purpose processors particularly for matrices with very irregular sparsity structure.
Proceedings ArticleDOI
64-bit floating-point FPGA matrix multiplication
TL;DR: A 64-bit ANSI/IEEE Std 754-1985 floating point design of a hardware matrix multiplier optimized for FPGA implementations and implement a scalable linear array of processing elements (PE) supporting the proposed algorithm in the Xilinx Virtex II Pro technology.
Journal ArticleDOI
Reconfigurable Computing Architectures
TL;DR: This work surveys the field of reconfigurable computing, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.
References
More filters
Journal ArticleDOI
New trends in high performance computing
TL;DR: The automatically tuned linear algebra software (ATLAS) project is described, as well as the fundamental principles that underly it, with the present emphasis on the basic linear algebra subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library.
Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147)
TL;DR: This paper describes the ATLAS (Automatically Tuned Linear Algebra Software) project, as well as the fundamental principles that underly it, with the present emphasis on the Basic Linear Al algebra Subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library.
Proceedings ArticleDOI
FPGAs vs. CPUs: trends in peak floating-point performance
TL;DR: This paper examines the impact of Moore's Law on the peak floating-point performance of FPGAs and results show that peak FPGA floating- point performance is growing significantly faster than peak CPU performance for a CPU.
Proceedings ArticleDOI
Quantitative analysis of floating point arithmetic on FPGA based custom computing machines
TL;DR: Using higher-level languages, like VHDL, facilitates the development of custom operators without significantly impacting operator performance or area, as well as properties, including area consumption and speed of working arithmetic operator units used in real-time applications.