scispace - formally typeset
Open AccessBook ChapterDOI

Effective source-to-source outlining to support whole program empirical optimization

TLDR
The ROSE source-to-source outliner is presented, which addresses the problem of extracting tunable kernels out of whole programs, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks.
Abstract
Although automated empirical performance optimization and tuning is well-studied for kernels and domain-specific libraries, a current research grand challenge is how to extend these methodologies and tools to significantly larger sequential and parallel applications. In this context, we present the ROSE source-to-source outliner, which addresses the problem of extracting tunable kernels out of whole programs, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks. Our outliner aims to handle large scale C/C++, Fortran and OpenMP applications. A set of program analysis and transformation techniques are utilized to enhance the portability, scalability, and interoperability of source-to-source outlining. More importantly, the generated kernels preserve performance characteristics of tuning targets and can be easily handled by other tools. Preliminary evaluations have shown that the ROSE outliner serves as a key component within an end-to-end empirical optimization system and enables a wide range of sequential and parallel optimization opportunities.

read more

Citations
More filters
Journal ArticleDOI

hiCUDA: High-Level GPGPU Programming

TL;DR: The hiCUDA}, a high-level directive-based language for CUDA programming is designed, which allows programmers to perform tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process.
Proceedings ArticleDOI

Mitigating the compiler optimization phase-ordering problem using machine learning

TL;DR: This paper develops a new approach that automatically selects good optimization orderings on a per method basis within a dynamic compiler and uses neuro-evolution to construct an artificial neural network that is capable of predicting beneficial optimization ordering for a piece of code that is being optimized.

The ROSE Source-to-Source Compiler Infrastructure

TL;DR: This talk will focus on the design and motivation for ROSE as an open community source-to-source compiler infrastructure to support performance optimization, tools for analysis, verification and software assurance, and general cusiness.
Proceedings ArticleDOI

Online Adaptive Code Generation and Tuning

TL;DR: This paper presents a runtime compilation and tuning framework for parallel programs that combines traditional feedback directed optimization and just-in-time compilation, and shows that the system can leverage available parallelism in today's HPC platforms by evaluating different code-variants on different nodes simultaneously.
Book ChapterDOI

A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries

TL;DR: This work simplifies OpenMP research by decoupling the problematic dependence between the compiler translations and the runtime libraries, and presents a set of rules to define a common OpenMP runtime library (XOMP) on top of multiple runtime libraries.
References
More filters
Proceedings ArticleDOI

Automatically Tuned Linear Algebra Software

TL;DR: An approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units using the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS).
Proceedings ArticleDOI

A fast Fourier transform compiler

TL;DR: The internals of this special-purpose compiler, called genfft, are described in some detail, and it is argued that a specialized compiler is a valuable tool.
Proceedings ArticleDOI

Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

TL;DR: This paper addresses the problem of how to select tile sizes and unroll factors simultaneously by means of iterative compilation and shows how to quantitatively trade-off the number of profiles needed and the level of optimization that can be reached.
Journal ArticleDOI

Heap reference analysis using access graphs

TL;DR: In this article, the authors propose an end-to-end static analysis to distinguish live objects from reachable objects and use this information to make dead objects unreachable by modifying the program.
Related Papers (5)