scispace - formally typeset
Proceedings ArticleDOI

A practical automatic polyhedral parallelizer and locality optimizer

TLDR
An automatic polyhedral source-to-source transformation framework that can optimize regular programs for parallelism and locality simultaneously simultaneously and is implemented into a tool to automatically generate OpenMP parallel code from C program sections.
Abstract
We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model -- far beyond what is possible by current production compilers. Unlike previous works, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high speedups for local and parallel execution on multi-cores over state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.

read more

Citations
More filters
Proceedings ArticleDOI

Memory-centric accelerator design for Convolutional Neural Networks

TL;DR: It is shown that the effects of the memory bottleneck can be reduced by a flexible memory hierarchy that supports the complex data access patterns in CNN workload and ensures that on-chip memory size is minimized, which reduces area and energy usage.
Journal ArticleDOI

Polyhedral parallel code generation for CUDA

TL;DR: A novel source-to-source compiler called PPCG is presented, which introduces a multilevel tiling strategy and a code generation scheme for the parallelization and locality optimization of imperfectly nested loops, managing memory and exposing concurrency according to the constraints of modern GPUs.
Proceedings ArticleDOI

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

TL;DR: This work presents a code generation and auto-tuning framework for stencil computations targeted at multi- and many core processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the auto tuning methodology to optimize strategy-dependent parameters for the given hardware architecture.
Posted Content

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

TL;DR: A language close to the mathematics of deep learning called Tensor Comprehensions offering both imperative and declarative styles, a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, and a compilation cache populated by an autotuner are contributed.
Journal ArticleDOI

Polly — performing polyhedral optimizations on a low-level intermediate representation

TL;DR: Polly is presented, an infrastructure for polyhedral optimizations on the compiler's internal, low-level, intermediate representation (IR) and an interface for connecting external optimizers and a novel way of using the parallelism they introduce to generate SIMD and OpenMP code is presented.
References
More filters
Book

Theory of Linear and Integer Programming

TL;DR: Introduction and Preliminaries.
Journal ArticleDOI

A Generalized inverse for matrices

TL;DR: A generalization of the inverse of a non-singular matrix is described in this paper as the unique solution of a certain set of equations, which is used here for solving linear matrix equations, and for finding an expression for the principal idempotent elements of a matrix.
Proceedings ArticleDOI

A data locality optimizing algorithm

TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.
Journal ArticleDOI

New trends in high performance computing

TL;DR: The automatically tuned linear algebra software (ATLAS) project is described, as well as the fundamental principles that underly it, with the present emphasis on the basic linear algebra subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library.

Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147)

TL;DR: This paper describes the ATLAS (Automatically Tuned Linear Algebra Software) project, as well as the fundamental principles that underly it, with the present emphasis on the Basic Linear Al algebra Subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library.