Effective source-to-source outlining to support whole program empirical optimization
Chunhua Liao,Daniel J. Quinlan,Richard Vuduc,Thomas Panas +3 more
- Vol. 5898, pp 308-322
TLDR
The ROSE source-to-source outliner is presented, which addresses the problem of extracting tunable kernels out of whole programs, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks.Abstract:
Although automated empirical performance optimization and tuning is well-studied for kernels and domain-specific libraries, a current research grand challenge is how to extend these methodologies and tools to significantly larger sequential and parallel applications. In this context, we present the ROSE source-to-source outliner, which addresses the problem of extracting tunable kernels out of whole programs, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks. Our outliner aims to handle large scale C/C++, Fortran and OpenMP applications. A set of program analysis and transformation techniques are utilized to enhance the portability, scalability, and interoperability of source-to-source outlining. More importantly, the generated kernels preserve performance characteristics of tuning targets and can be easily handled by other tools. Preliminary evaluations have shown that the ROSE outliner serves as a key component within an end-to-end empirical optimization system and enables a wide range of sequential and parallel optimization opportunities.read more
Citations
More filters
Journal ArticleDOI
hiCUDA: High-Level GPGPU Programming
TL;DR: The hiCUDA}, a high-level directive-based language for CUDA programming is designed, which allows programmers to perform tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process.
Proceedings ArticleDOI
Mitigating the compiler optimization phase-ordering problem using machine learning
Sameer Kulkarni,John Cavazos +1 more
TL;DR: This paper develops a new approach that automatically selects good optimization orderings on a per method basis within a dynamic compiler and uses neuro-evolution to construct an artificial neural network that is capable of predicting beneficial optimization ordering for a piece of code that is being optimized.
The ROSE Source-to-Source Compiler Infrastructure
Daniel J. Quinlan,Chunhua Liao +1 more
TL;DR: This talk will focus on the design and motivation for ROSE as an open community source-to-source compiler infrastructure to support performance optimization, tools for analysis, verification and software assurance, and general cusiness.
Proceedings ArticleDOI
Online Adaptive Code Generation and Tuning
TL;DR: This paper presents a runtime compilation and tuning framework for parallel programs that combines traditional feedback directed optimization and just-in-time compilation, and shows that the system can leverage available parallelism in today's HPC platforms by evaluating different code-variants on different nodes simultaneously.
Book ChapterDOI
A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries
TL;DR: This work simplifies OpenMP research by decoupling the problematic dependence between the compiler translations and the runtime libraries, and presents a set of rules to define a common OpenMP runtime library (XOMP) on top of multiple runtime libraries.
References
More filters
Proceedings ArticleDOI
Automatically Tuned Linear Algebra Software
R. Clint Whaley,Jack Dongarra +1 more
TL;DR: An approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units using the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS).
Proceedings ArticleDOI
A fast Fourier transform compiler
TL;DR: The internals of this special-purpose compiler, called genfft, are described in some detail, and it is argued that a specialized compiler is a valuable tool.
Proceedings ArticleDOI
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
TL;DR: This paper addresses the problem of how to select tile sizes and unroll factors simultaneously by means of iterative compilation and shows how to quantitatively trade-off the number of profiles needed and the level of optimization that can be reached.
Journal ArticleDOI
Heap reference analysis using access graphs
TL;DR: In this article, the authors propose an end-to-end static analysis to distinguish live objects from reachable objects and use this information to make dead objects unreachable by modifying the program.