Examining the viability of FPGA supercomputing
read more
Citations
HardWare-In-The-Loop simulation table for UAV navigation complexes testing
Design and Evaluation of a Heuristic Optimization Tool Based on Evolutionary Grammars Using PSoCs
Field programmable gate arrays with hardwired networks on chip
Implementing Scientific Simulation Codes Highly Tailored for Vector Architectures Using Custom Configurable Computing Machines
Viabilidad de acelerar la migración sísmica 2D usando un procesador específico implementado sobre un FPGA
References
Reconfigurable computing: a survey of systems and software
Cell broadband engine architecture and its first implementation: a performance view
Reconfigurable Computing for Digital Signal Processing: A Survey
FPGAs vs. CPUs: trends in peak floating-point performance
64-bit floating-point FPGA matrix multiplication
Related Papers (5)
Frequently Asked Questions (16)
Q2. What is the strongest suit of FPGAs?
The strong suit of FPGAs, however, is low-precision fixed-point or integer arithmetic and no current device families contain dedicated floating-point operators though dedicated integer multipliers are prevalent.
Q3. What is the efficient multiplication algorithm for large integers?
One of the most efficient multiplication algorithms for large integers utilizes the FFT, treating the number being squared as a long sequence of smaller numbers.
Q4. What factors must be weighed when comparing HPC architectures?
When comparing HPC architectures many factors must be weighed, including memory and I/O bandwidth, communication latencies, and peak and sustained performance.
Q5. What is the common requirement for a floating-point arithmetic?
Many HPC applications and benchmarks require doubleprecision floating-point arithmetic to support a large dy-namic range and ensure numerical stability.
Q6. How many FPGAs could be used to optimize software?
to permit their design to be more costcompetitive, even against efficient software implementations, smaller more cost-effective FPGAs could be used.
Q7. How much money has been awarded to the first person to identify a large Mersenne prime?
The distributed computing project GIMPS was created to identify large Mersenne primes and a reward of US$100,000 has been issued for the first person to identify a prime number with greater than 10 million digits.
Q8. What is the common use of floating-point math?
Floating-point arithmetic is so prevalent that the benchmarking application ranking supercomputers, LINPACK, heavily utilizes doubleprecision floating-point math.
Q9. Why is floating-point arithmetic so prevalent in HPC applications?
Due to the prevalence of floating-point arithmetic in HPC applications, research in academia and industry has focused on floating-point hardware designs [14, 15], libraries [16, 17], and development tools [18] to effectively perform floating-point math on FPGAs.
Q10. How many multipliers are needed for the Xilinx design?
For Xilinx’s double-precision floatingpoint core 16 of these 18-bit multipliers are required [35] for each multiplier, while for the Dou et al. design only nine are needed.
Q11. How much faster could a reworked implementation achieve?
A slightly reworked implementation, designed as an FFT accelerator with all serial functions implemented on an attached processor, could achieve a speedup of 2.6 compared to a processor alone.
Q12. What is the main reason why the availability of high-performance clusters incorporating FPGAs?
The availability of high-performance clusters incorporating FPGAs has prompted efforts to explore acceleration of HPC applications.
Q13. What is the key contribution of this paper?
The key contributions of this paper are the addition of an economic analysis to a discussion of FPGA supercomputing projects and the presentation of an effective benchmark for comparing FPGAs and processors on an equal footing.
Q14. What is the efficient port of the algorithm from software to hardware?
Performing a traditional port of the algorithm from software to hardware involves the creation of a floating-point FFT on the FPGA.
Q15. What is the speedup of the stand-alone FPGA?
In spite of the unique all-integer algorithmic approach, the stand-alone FPGA implementation only achieved a speedup of 1.76 compared to a 3.4 GHz Pentium 4 processor.
Q16. What is the difference between the Dou et al. and Underwood design?
While there is always a danger from drawing conclusions from a small data set, both the Dou et al. and Underwood design results point to a crossover point sometime around 2009 to 2012 when the largest FPGA devices, like those typically found in commercial FPGA-augmented HPC clusters, will be cost effectively compared to processors for doubleprecision floating-point calculations.