A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs
TLDR
An automated precision-selection method and a novel GPU register file organization that can store floating-point register values at arbitrary precisions densely are proposed that can enable GPUs to keep up to twice as many threads in flight simultaneously.Abstract:
Reducing the precision of floating-point values can improve performance and/or reduce energy expenditure in computer graphics, among other, applications. However, reducing the precision level of floating-point values in a controlled fashion needs support both at the compiler and at the microarchitecture level. At the compiler level, a method is needed to automate the reduction of precision of each floating-point value. At the microarchitecture level, a lower precision of each floating-point register can allow more floating-point values to be packed into a register file. This, however, calls for new register file organizations.This article proposes an automated precision-selection method and a novel GPU register file organization that can store floating-point register values at arbitrary precisions densely. The automated precision-selection method uses a data-driven approach for setting the precision level of floating-point values, given a quality threshold and a representative set of input data. By allowing a small, but acceptable, degradation in output quality, our method can remove a significant amount of the bits needed to represent floating-point values in the investigated kernels (between 28% and 60%). Our proposed register file organization exploits these lower-precision floating-point values by packing several of them into the same physical register. This reduces the register pressure per thread by up to 48%, and by 27% on average, for a negligible output-quality degradation. This can enable GPUs to keep up to twice as many threads in flight simultaneously.read more
Citations
More filters
Journal ArticleDOI
Tools for Reduced Precision Computation: A Survey
Stefano Cherubin,Giovanni Agosta +1 more
TL;DR: There is still a gap to close in automation of reduced precision customization, especially for tools based on static analysis rather than profiling, as well as for integration within mainstream, industry-strength compiler frameworks.
Journal ArticleDOI
Energy-Efficient Iterative Refinement Using Dynamic Precision
TL;DR: Novel methods are proposed which allow iterative refinement to utilize variable precision arithmetic dynamically in a loop (i.e., a trans-precision approach) and restructure a numeric algorithm dynamically according to runtime numeric behavior and remove unnecessary accuracy checks.
Journal ArticleDOI
MemSZ: Squeezing Memory Traffic with Lossy Compression
TL;DR: MemSZ introduces a low latency, parallel design of the Squeeze (SZ) algorithm offering aggressive compression ratios, up to 16:1 in the authors' implementation, and improves the execution time, energy, and memory traffic by up to 15%, 9%, and 64%, respectively.
Proceedings ArticleDOI
AVR: Reducing Memory Traffic with Approximate Value Reconstruction
TL;DR: Approximate Value Reconstruction (AVR) reduces the memory traffic of applications that tolerate approximations in their dataset improving significantly system performance and energy efficiency and supports the compression scheme maximizing its effect and minimizing its overheads.
Journal ArticleDOI
L2C: Combining Lossy and Lossless Compression on Memory and I/O
TL;DR: L2C employs general-purpose lossless compression and combines it with state-of-the-art lossy compression to achieve compression ratios up to 16:1 and to improve the utilization of chip’s bandwidth resources.
References
More filters
Journal ArticleDOI
Image quality assessment: from error visibility to structural similarity
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Proceedings ArticleDOI
LLVM: a compilation framework for lifelong program analysis & transformation
Chris Lattner,Vikram Adve +1 more
TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.
Book
Theory for off-specular reflection from roughened surfaces
K. E. Torrance,Ephraim M Sparrow +1 more
TL;DR: In this paper, the directional distribution of radiant flux reflected from roughened surfaces is analyzed on the basis of geometrical optics, and the analysis successfully predicts the off-specular maxima in the reflection distribution which are observed experimentally and which emerge as the incidence angle increases.
Journal ArticleDOI
Theory for Off-Specular Reflection From Roughened Surfaces*
K. E. Torrance,Ephraim M Sparrow +1 more
TL;DR: In this paper, the directional distribution of radiant flux reflected from roughened surfaces is analyzed on the basis of geometrical optics, and the analysis successfully predicts the off-specular maxima in the reflection distribution which are observed experimentally and which emerge as the incidence angle increases.
Journal ArticleDOI
EnerJ: approximate data types for safe and general low-power computation
TL;DR: EnerJ is developed, an extension to Java that adds approximate data types and a hardware architecture that offers explicit approximate storage and computation and allows a programmer to control explicitly how information flows from approximate data to precise data.