scispace - formally typeset
Open AccessProceedings ArticleDOI

Managing performance vs. accuracy trade-offs with loop perforation

Reads0
Chats0
TLDR
The results indicate that, for a range of applications, this approach typically delivers performance increases of over a factor of two (and up to a factors of seven) while changing the result that the application produces by less than 10%.
Abstract
Many modern computations (such as video and audio encoders, Monte Carlo simulations, and machine learning algorithms) are designed to trade off accuracy in return for increased performance. To date, such computations typically use ad-hoc, domain-specific techniques developed specifically for the computation at hand. Loop perforation provides a general technique to trade accuracy for performance by transforming loops to execute a subset of their iterations. A criticality testing phase filters out critical loops (whose perforation produces unacceptable behavior) to identify tunable loops (whose perforation produces more efficient and still acceptably accurate computations). A perforation space exploration algorithm perforates combinations of tunable loops to find Pareto-optimal perforation policies. Our results indicate that, for a range of applications, this approach typically delivers performance increases of over a factor of two (and up to a factor of seven) while changing the result that the application produces by less than 10%.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A survey on quality-assurance approximate stream processing and applications

TL;DR: A comprehensive study of approximate computing techniques for data streams, classified as data-driven and computing-driven methods, and the combination of the two methods in emerging distributed processing environments is proposed.
Posted Content

Automatic Software Diversity in the Light of Test Suites

TL;DR: The investigation of the influence of test suites on sosiefication exploits the following observation: test suites cover the different regions of programs in very unequal ways, and it is hypothesized that sosie synthesis has different performances on a statement that is covered by one hundred test case and on a statements that are covered by a single test case.
Proceedings ArticleDOI

Dynamic Multi-Resolution Data Storage

TL;DR: Varifocal Storage dynamically adjusts the dataset resolution within a storage device, thereby mitigating the performance bottleneck of exchanging/preparing data for approximate compute kernels and offers flexible, efficient support for approximate and exact computing without exceeding the costs of conventional storage systems.
Journal ArticleDOI

GEVO: GPU Code Optimization Using Evolutionary Computation

TL;DR: GEVO as mentioned in this paper uses population-based search to find edits to GPU code compiled to LLVM-IR and improves performance on desired criteria while retaining required functionality, achieving 1.79× kernel performance improvement on image classification using ResNet18/CIFAR-10.
Proceedings ArticleDOI

Static Program Analysis for Identifying Energy Bugs in Graphics-Intensive Mobile Apps

TL;DR: A novel static optimization technique for eliminating drawing commands to produce energy-efficient apps is proposed and savings up to 44% of the total energy consumption of the device are indicated.
References
More filters
Proceedings ArticleDOI

LLVM: a compilation framework for lifelong program analysis & transformation

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.
Journal ArticleDOI

The JPEG still picture compression standard

TL;DR: The Baseline method has been by far the most widely implemented JPEG method to date, and is sufficient in its own right for a large number of applications.
Proceedings ArticleDOI

The PARSEC benchmark suite: characterization and architectural implications

TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.
Related Papers (5)