Single-precision floating-point format
About: Single-precision floating-point format is a research topic. Over the lifetime, 721 publications have been published within this topic receiving 11065 citations. The topic is also known as: binary32 & single.
Papers published on a yearly basis
TL;DR: An implementation of generalized Born implicit solvent all-atom classical molecular dynamics within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs) and shows performance that is on par with, and in some cases exceeds, that of traditional supercomputers.
Abstract: We present an implementation of generalized Born implicit solvent all-atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs). We discuss the algorithms that are used to exploit the processing power of the GPUs and show the performance that can be achieved in comparison to simulations on conventional CPU clusters. The implementation supports three different precision models in which the contributions to the forces are calculated in single precision floating point arithmetic but accumulated in double precision (SPDP), or everything is computed in single precision (SPSP) or double precision (DPDP). In addition to performance, we have focused on understanding the implications of the different precision models on the outcome of implicit solvent MD simulations. We show results for a range of tests including the accuracy of single point force evaluations and energy conservation as well as structural properties pertainining to protein dynamics. The numerical noise due to rounding errors within the SPSP precision model is sufficiently large to lead to an accumulation of errors which can result in unphysical trajectories for long time scale simulations. We recommend the use of the mixed-precision SPDP model since the numerical results obtained are comparable with those of the full double precision DPDP model and the reference double precision CPU implementation but at significantly reduced computational cost. Our implementation provides performance for GB simulations on a single desktop that is on par with, and in some cases exceeds, that of traditional supercomputers.
19 Dec 2018
TL;DR: In this paper, the authors demonstrate the successful training of deep neural networks using 8-bit floating point numbers while fully maintaining the accuracy on a spectrum of deep learning models and datasets.
Abstract: The state-of-the-art hardware platforms for training deep neural networks are moving from traditional single precision (32-bit) computations towards 16 bits of precision - in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations However, unlike inference, training with numbers represented with less than 16 bits has been challenging due to the need to maintain fidelity of the gradient computations during back-propagation Here we demonstrate, for the first time, the successful training of deep neural networks using 8-bit floating point numbers while fully maintaining the accuracy on a spectrum of deep learning models and datasets In addition to reducing the data and computation precision to 8 bits, we also successfully reduce the arithmetic precision for additions (used in partial product accumulation and weight updates) from 32 bits to 16 bits through the introduction of a number of key ideas including chunk-based accumulation and floating point stochastic rounding The use of these novel techniques lays the foundation for a new generation of hardware training platforms with the potential for 2-4 times improved throughput over today's systems
26 Apr 2004
TL;DR: The impact of floating-point units on the design of an energy efficient architecture for the matrix multiply kernel is discussed and it is shown that FPGAs are capable of achieving up to 6x improvement in terms of the GFLOPS/W metric over that of general purpose processors.
Abstract: Summary form only given. FPGAs are increasingly being used in the high performance and scientific computing community to implement floating-point based hardware accelerators. We analyze the floating-point multiplier and adder/subtractor units by considering the number of pipeline stages of the units as a parameter and use throughput/area as the metric. We achieve throughput rates of more than 240 Mhz (200 Mhz) for single (double) precision operations by deeply pipelining the units. To illustrate the impact of the floating-point units on a kernel, we implement a matrix multiplication kernel based on our floating-point units and show that a state-of-the-art FPGA device is capable of achieving about 15 GFLOPS (8 GFLOPS) for the single (double) precision floating-point based matrix multiplication. We also show that FPGAs are capable of achieving up to 6x improvement (for single precision) in terms of the GFLOPS/W (performance per unit power) metric over that of general purpose processors. We then discuss the impact of floating-point units on the design of an energy efficient architecture for the matrix multiply kernel.
TL;DR: A high-order finite-element application that performs the numerical simulation of seismic wave propagation resulting from earthquakes in the Earth on NVIDIA GeForce 8800 GTX and GTX 280 graphics cards using CUDA is ported.
Abstract: We port a high-order finite-element application that performs the numerical simulation of seismic wave propagation resulting from earthquakes in the Earth on NVIDIA GeForce 8800 GTX and GTX 280 graphics cards using CUDA. This application runs in single precision and is therefore a good candidate for implementation on current GPU hardware, which either does not support double precision or supports it but at the cost of reduced performance. We discuss and compare two implementations of the code: one that has maximum efficiency but is limited to the memory size of the card, and one that can handle larger problems but that is less efficient. We use a coloring scheme to handle efficiently summation operations over nodes on a topology with variable valence. We perform several numerical tests and performance measurements and show that in the best case we obtain a speedup of 25.
TL;DR: This paper develops a general purpose Lattice Boltzmann code that runs entirely on a single GPU and shows that simple precision floating point arithmetic is sufficient for LBM computation in comparison to double precision.
Abstract: Graphics Processing Units (GPUs), originally developed for computer games, now provide computational power for scientific applications. In this paper, we develop a general purpose Lattice Boltzmann code that runs entirely on a single GPU. The results show that: (1) simple precision floating point arithmetic is sufficient for LBM computation in comparison to double precision; (2) the implementation of LBM on GPUs allows us to achieve up to about one billion lattice update per second using single precision floating point; (3) GPUs provide an inexpensive alternative to large clusters for fluid dynamics prediction.