scispace - formally typeset
Open AccessJournal ArticleDOI

A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs

TLDR
An automated precision-selection method and a novel GPU register file organization that can store floating-point register values at arbitrary precisions densely are proposed that can enable GPUs to keep up to twice as many threads in flight simultaneously.
Abstract
Reducing the precision of floating-point values can improve performance and/or reduce energy expenditure in computer graphics, among other, applications. However, reducing the precision level of floating-point values in a controlled fashion needs support both at the compiler and at the microarchitecture level. At the compiler level, a method is needed to automate the reduction of precision of each floating-point value. At the microarchitecture level, a lower precision of each floating-point register can allow more floating-point values to be packed into a register file. This, however, calls for new register file organizations.This article proposes an automated precision-selection method and a novel GPU register file organization that can store floating-point register values at arbitrary precisions densely. The automated precision-selection method uses a data-driven approach for setting the precision level of floating-point values, given a quality threshold and a representative set of input data. By allowing a small, but acceptable, degradation in output quality, our method can remove a significant amount of the bits needed to represent floating-point values in the investigated kernels (between 28% and 60%). Our proposed register file organization exploits these lower-precision floating-point values by packing several of them into the same physical register. This reduces the register pressure per thread by up to 48%, and by 27% on average, for a negligible output-quality degradation. This can enable GPUs to keep up to twice as many threads in flight simultaneously.

read more

Citations
More filters
Journal ArticleDOI

Tools for Reduced Precision Computation: A Survey

TL;DR: There is still a gap to close in automation of reduced precision customization, especially for tools based on static analysis rather than profiling, as well as for integration within mainstream, industry-strength compiler frameworks.
Journal ArticleDOI

Energy-Efficient Iterative Refinement Using Dynamic Precision

TL;DR: Novel methods are proposed which allow iterative refinement to utilize variable precision arithmetic dynamically in a loop (i.e., a trans-precision approach) and restructure a numeric algorithm dynamically according to runtime numeric behavior and remove unnecessary accuracy checks.
Journal ArticleDOI

MemSZ: Squeezing Memory Traffic with Lossy Compression

TL;DR: MemSZ introduces a low latency, parallel design of the Squeeze (SZ) algorithm offering aggressive compression ratios, up to 16:1 in the authors' implementation, and improves the execution time, energy, and memory traffic by up to 15%, 9%, and 64%, respectively.
Proceedings ArticleDOI

AVR: Reducing Memory Traffic with Approximate Value Reconstruction

TL;DR: Approximate Value Reconstruction (AVR) reduces the memory traffic of applications that tolerate approximations in their dataset improving significantly system performance and energy efficiency and supports the compression scheme maximizing its effect and minimizing its overheads.
Journal ArticleDOI

L2C: Combining Lossy and Lossless Compression on Memory and I/O

TL;DR: L2C employs general-purpose lossless compression and combines it with state-of-the-art lossy compression to achieve compression ratios up to 16:1 and to improve the utilization of chip’s bandwidth resources.
References
More filters
Journal ArticleDOI

Image quality assessment: from error visibility to structural similarity

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Proceedings ArticleDOI

LLVM: a compilation framework for lifelong program analysis & transformation

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.
Book

Theory for off-specular reflection from roughened surfaces

TL;DR: In this paper, the directional distribution of radiant flux reflected from roughened surfaces is analyzed on the basis of geometrical optics, and the analysis successfully predicts the off-specular maxima in the reflection distribution which are observed experimentally and which emerge as the incidence angle increases.
Journal ArticleDOI

Theory for Off-Specular Reflection From Roughened Surfaces*

TL;DR: In this paper, the directional distribution of radiant flux reflected from roughened surfaces is analyzed on the basis of geometrical optics, and the analysis successfully predicts the off-specular maxima in the reflection distribution which are observed experimentally and which emerge as the incidence angle increases.
Journal ArticleDOI

EnerJ: approximate data types for safe and general low-power computation

TL;DR: EnerJ is developed, an extension to Java that adds approximate data types and a hardware architecture that offers explicit approximate storage and computation and allows a programmer to control explicitly how information flows from approximate data to precise data.
Related Papers (5)