A practical automatic polyhedral parallelizer and locality optimizer

doi:10.1145/1375581.1375595

Proceedings ArticleDOI

A practical automatic polyhedral parallelizer and locality optimizer

- Vol. 43, Iss: 6, pp 101-113

TLDR

An automatic polyhedral source-to-source transformation framework that can optimize regular programs for parallelism and locality simultaneously simultaneously and is implemented into a tool to automatically generate OpenMP parallel code from C program sections.

Abstract:

We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model -- far beyond what is possible by current production compilers. Unlike previous works, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high speedups for local and parallel execution on multi-cores over state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Memory-centric accelerator design for Convolutional Neural Networks

Maurice Peemen, +3 more

TL;DR: It is shown that the effects of the memory bottleneck can be reduced by a flexible memory hierarchy that supports the complex data access patterns in CNN workload and ensures that on-chip memory size is minimized, which reduces area and energy usage.

...read moreread less

Journal ArticleDOI

Polyhedral parallel code generation for CUDA

Sven Verdoolaege, +5 more

TL;DR: A novel source-to-source compiler called PPCG is presented, which introduces a multilevel tiling strategy and a code generation scheme for the parallelization and locality optimization of imperfectly nested loops, managing memory and exposing concurrency according to the constraints of modern GPUs.

...read moreread less

Proceedings ArticleDOI

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

Matthias Christen, +2 more

TL;DR: This work presents a code generation and auto-tuning framework for stencil computations targeted at multi- and many core processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the auto tuning methodology to optimize strategy-dependent parameters for the given hardware architecture.

...read moreread less

Posted Content

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

Nicolas Vasilache, +8 more

- 13 Feb 2018 -

arXiv: Programming Languages

TL;DR: A language close to the mathematics of deep learning called Tensor Comprehensions offering both imperative and declarative styles, a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, and a compilation cache populated by an autotuner are contributed.

...read moreread less

Journal ArticleDOI

Polly — performing polyhedral optimizations on a low-level intermediate representation

Tobias Grosser, +2 more

- 27 Dec 2012 -

Parallel Processing Letters

TL;DR: Polly is presented, an infrastructure for polyhedral optimizations on the compiler's internal, low-level, intermediate representation (IR) and an interface for connecting external optimizers and a novel way of using the parallelism they introduce to generate SIMD and OpenMP code is presented.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Theory of Linear and Integer Programming

Alexander Schrijver

TL;DR: Introduction and Preliminaries.

...read moreread less

Journal ArticleDOI

A Generalized inverse for matrices

Roger Penrose

TL;DR: A generalization of the inverse of a non-singular matrix is described in this paper as the unique solution of a certain set of equations, which is used here for solving linear matrix equations, and for finding an expression for the principal idempotent elements of a matrix.

...read moreread less

Proceedings ArticleDOI

A data locality optimizing algorithm

Michael Wolf, +1 more

TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.

...read moreread less

Journal ArticleDOI

New trends in high performance computing

Osman Yasar, +3 more

TL;DR: The automatically tuned linear algebra software (ATLAS) project is described, as well as the fundamental principles that underly it, with the present emphasis on the basic linear algebra subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library.

...read moreread less

Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147)

Clint Whaley, +2 more

TL;DR: This paper describes the ATLAS (Automatically Tuned Linear Algebra Software) project, as well as the fundamental principles that underly it, with the present emphasis on the Basic Linear Al algebra Subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library.

...read moreread less

Collapse

Related Papers (5)

Code Generation in the Polyhedral Model Is Easier Than You Think

Cédric Bastoul

Some efficient solutions to the affine scheduling problem: I. One-dimensional time

Paul Feautrier

- 01 Oct 1992 -

International Journal of Parallel Progra...

A practical automatic polyhedral parallelizer and locality optimizer

Citations

Memory-centric accelerator design for Convolutional Neural Networks

Polyhedral parallel code generation for CUDA

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

Polly — performing polyhedral optimizations on a low-level intermediate representation

References

Theory of Linear and Integer Programming

A Generalized inverse for matrices

A data locality optimizing algorithm

New trends in high performance computing

Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147)

Related Papers (5)

Code Generation in the Polyhedral Model Is Easier Than You Think

Some efficient solutions to the affine scheduling problem: I. One-dimensional time

A data locality optimizing algorithm

Supernode partitioning

Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time