A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs

doi:10.1145/3151032

Open AccessJournal ArticleDOI

A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs

Alexandra Angerd, +2 more

- 05 Dec 2017 -

ACM Transactions on Architecture and Cod...

- Vol. 14, Iss: 4, pp 46

TLDR

An automated precision-selection method and a novel GPU register file organization that can store floating-point register values at arbitrary precisions densely are proposed that can enable GPUs to keep up to twice as many threads in flight simultaneously.

Abstract:

Reducing the precision of floating-point values can improve performance and/or reduce energy expenditure in computer graphics, among other, applications. However, reducing the precision level of floating-point values in a controlled fashion needs support both at the compiler and at the microarchitecture level. At the compiler level, a method is needed to automate the reduction of precision of each floating-point value. At the microarchitecture level, a lower precision of each floating-point register can allow more floating-point values to be packed into a register file. This, however, calls for new register file organizations.This article proposes an automated precision-selection method and a novel GPU register file organization that can store floating-point register values at arbitrary precisions densely. The automated precision-selection method uses a data-driven approach for setting the precision level of floating-point values, given a quality threshold and a representative set of input data. By allowing a small, but acceptable, degradation in output quality, our method can remove a significant amount of the bits needed to represent floating-point values in the investigated kernels (between 28% and 60%). Our proposed register file organization exploits these lower-precision floating-point values by packing several of them into the same physical register. This reduces the register pressure per thread by up to 48%, and by 27% on average, for a negligible output-quality degradation. This can enable GPUs to keep up to twice as many threads in flight simultaneously.

A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs

Citations

Tools for Reduced Precision Computation: A Survey

Energy-Efficient Iterative Refinement Using Dynamic Precision

MemSZ: Squeezing Memory Traffic with Lossy Compression

AVR: Reducing Memory Traffic with Approximate Value Reconstruction

L2C: Combining Lossy and Lossless Compression on Memory and I/O

References

Image quality assessment: from error visibility to structural similarity

LLVM: a compilation framework for lifelong program analysis & transformation

Theory for off-specular reflection from roughened surfaces

Theory for Off-Specular Reflection From Roughened Surfaces*

EnerJ: approximate data types for safe and general low-power computation

Related Papers (5)

Abstract: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation

In order issue out-of-order execution floating-point coprocessor for CalmRISC32

Efficient floating point precision tuning for approximate computing

Improving Floating-Point Performance in Less Area: Fractured Floating Point Units (FFPUs)

Implementation of Custom Precision Floating Point Arithmetic on FPGAs