scispace - formally typeset
Open AccessProceedings ArticleDOI

Managing performance vs. accuracy trade-offs with loop perforation

Reads0
Chats0
TLDR
The results indicate that, for a range of applications, this approach typically delivers performance increases of over a factor of two (and up to a factors of seven) while changing the result that the application produces by less than 10%.
Abstract
Many modern computations (such as video and audio encoders, Monte Carlo simulations, and machine learning algorithms) are designed to trade off accuracy in return for increased performance. To date, such computations typically use ad-hoc, domain-specific techniques developed specifically for the computation at hand. Loop perforation provides a general technique to trade accuracy for performance by transforming loops to execute a subset of their iterations. A criticality testing phase filters out critical loops (whose perforation produces unacceptable behavior) to identify tunable loops (whose perforation produces more efficient and still acceptably accurate computations). A perforation space exploration algorithm perforates combinations of tunable loops to find Pareto-optimal perforation policies. Our results indicate that, for a range of applications, this approach typically delivers performance increases of over a factor of two (and up to a factor of seven) while changing the result that the application produces by less than 10%.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

FTxAC: Leveraging the Approximate Computing Paradigm in the Design of Fault-Tolerant Embedded Systems to Reduce Overheads

TL;DR: Results show that, depending on the level of approximation, the evaluated application, and the fault tolerance strategy employed, the performance of the system can be improved, even counteracting completely the implicit overheads of the redundancy.
Proceedings ArticleDOI

Approximating Memory-bound Applications on Mobile GPUs

TL;DR: This work approximates six applications and evaluates them on two mobile GPU architectures with very different memory layouts, showing that, even when the local memory is not mapped to dedicated fast memory in hardware, kernel perforation is still capable of speedup.
Journal ArticleDOI

Predictive Compliance Monitoring in Process-Aware Information Systems: State of the Art, Functionalities, Research Directions

TL;DR: In this article , a comprehensive predictive compliance monitoring system that integrates existing predicate prediction approaches with the idea of employing PPM with different prediction goals such as next activity or remaining time for prediction and subsequent mapping of the prediction results onto the given set of compliance constraints (PCM) is presented.
Journal ArticleDOI

Approximate Logic Synthesis Using Boolean Matrix Factorization

TL;DR: In this paper , the authors present a method for approximate logic synthesis based on the Boolean matrix factorization, where an arbitrary input circuit can be approximated in a controlled fashion, and a unified approach enabling the factorization algorithm to utilize semiring algebra, field algebra and a combination of both for truth table factorization.
Proceedings ArticleDOI

Space-Efficient Pointwise Computation of the Distance Transform on GPUs

TL;DR: The brute-force distance transform algorithm achieves the threefold goals of memory efficiency, flexibility, and performance by decomposing it into a map and a reduction pattern on the massively parallel architecture of a modern Graphics Processing Unit (GPU).
References
More filters
Proceedings ArticleDOI

LLVM: a compilation framework for lifelong program analysis & transformation

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.
Journal ArticleDOI

The JPEG still picture compression standard

TL;DR: The Baseline method has been by far the most widely implemented JPEG method to date, and is sufficient in its own right for a large number of applications.
Proceedings ArticleDOI

The PARSEC benchmark suite: characterization and architectural implications

TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.
Related Papers (5)