scispace - formally typeset
Open AccessProceedings ArticleDOI

Managing performance vs. accuracy trade-offs with loop perforation

Reads0
Chats0
TLDR
The results indicate that, for a range of applications, this approach typically delivers performance increases of over a factor of two (and up to a factors of seven) while changing the result that the application produces by less than 10%.
Abstract
Many modern computations (such as video and audio encoders, Monte Carlo simulations, and machine learning algorithms) are designed to trade off accuracy in return for increased performance. To date, such computations typically use ad-hoc, domain-specific techniques developed specifically for the computation at hand. Loop perforation provides a general technique to trade accuracy for performance by transforming loops to execute a subset of their iterations. A criticality testing phase filters out critical loops (whose perforation produces unacceptable behavior) to identify tunable loops (whose perforation produces more efficient and still acceptably accurate computations). A perforation space exploration algorithm perforates combinations of tunable loops to find Pareto-optimal perforation policies. Our results indicate that, for a range of applications, this approach typically delivers performance increases of over a factor of two (and up to a factor of seven) while changing the result that the application produces by less than 10%.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Redundant Loads: A Software Inefficiency Indicator

TL;DR: LoadSpy is developed, a whole-program profiler to pinpoint redundant memory load operations, which are often a symptom of many redundant operations in programs, and optimize several well-known benchmarks and real-world applications, yielding significant speedups.
Proceedings ArticleDOI

Autogenerating Fast Packet-Processing Code Using Program Synthesis

TL;DR: This work applies program synthesis to build a code generator, Chipmunk, for a simulator of the protocol-independent switch architecture (PISA), which generates code for many programs that a previous code generator based on classical compiler optimizations rejects and uses much fewer hardware resources.
Journal ArticleDOI

A Cross-Layer Multicore Architecture to Tradeoff Program Accuracy and Resilience Overheads

TL;DR: Declarative resilience is proposed that selectively applies resilience schemes to both crucial and non-crucial code, while ensuring program correctness, and improves completion time by an average of 21 percent over state-of-the-art hardware resilience scheme that protects all executed code.
Proceedings ArticleDOI

Sculptor: Flexible Approximation with Selective Dynamic Loop Perforation

TL;DR: This paper introduces selective dynamic loop perforation, a general approximation technique that automatically transforms loops to skip selected instructions in selected iterations, and proposes several compiler optimizations to resolve these challenges, including optimized instruction-level, load based and store based selective perforated, and self-directed dynamic perforations with a dynamic start and dynamic per foration rates.
Journal ArticleDOI

Toward Self-Tunable Approximate Computing

TL;DR: An approximate self-adaptive architecture that autotunes itself at runtime based on the workload that works well compared to other approximation methods and keeps the output error within the given maximum error threshold at relatively low-area overheads.
References
More filters
Proceedings ArticleDOI

LLVM: a compilation framework for lifelong program analysis & transformation

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.
Journal ArticleDOI

The JPEG still picture compression standard

TL;DR: The Baseline method has been by far the most widely implemented JPEG method to date, and is sufficient in its own right for a large number of applications.
Proceedings ArticleDOI

The PARSEC benchmark suite: characterization and architectural implications

TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.
Related Papers (5)