scispace - formally typeset
Search or ask a question
Author

Paul Feautrier

Bio: Paul Feautrier is an academic researcher from École normale supérieure de Lyon. The author has contributed to research in topics: Automatic parallelization & Polytope model. The author has an hindex of 24, co-authored 75 publications receiving 3992 citations. Previous affiliations of Paul Feautrier include University of Paris & French Institute for Research in Computer Science and Automation.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper presents an algorithm for analyzing the patterns along which values flow as the execution proceeds, and discusses several applications of the method: conversion of a program to a set of recurrence equations, array and scalar expansion, program verification and parallel program construction.
Abstract: Given a program written in a simple imperative language (assignment statements,for loops, affine indices and loop limits), this paper presents an algorithm for analyzing the patterns along which values flow as the execution proceeds. For each array or scalar reference, the result is the name and iteration vector of the source statement as a function of the iteration vector of the referencing statement. The paper discusses several applications of the method: conversion of a program to a set of recurrence equations, array and scalar expansion, program verification and parallel program construction.

618 citations

Journal ArticleDOI
TL;DR: This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector and presents an algorithm which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm.
Abstract: Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In may cases, actions may be labelled by integral vectors in some iterations domains, and precedence constraints may be described by affine relations. A schedule for such a program is a function which assigns an execution data to each action. Knowledge of such a schedule allows one to estimate the intrinsic degree of parallelism of the program and to compile a parallel version for multiprocessor architectures or systolic arrays. This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector. An algorithm is presented which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm.

614 citations

Journal ArticleDOI
TL;DR: In this paper, the analysis semantique des programs informatiques conduit a la resolution de problemes de programmation parametrique entiere, i.e., the problem of finding a parametrization of a program.
Abstract: L'analyse semantique des programmes informatiques conduit a la resolution de problemes de programmation parametrique entiere. L'article s'est ainsi consacre a la construction d'un algorithme de ce type

454 citations

Journal ArticleDOI
TL;DR: This paper extends the algorithms which were developed in Part I to cases in which there is no affine schedule, i.e. to problems whose parallel complexity is polynomial but not linear, and gives some experimental evidence for the applicability, performances and limitations of the algorithm.
Abstract: This paper extends the algorithms which were developed in Part I to cases in which there is no affine schedule, i.e. to problems whose parallel complexity is polynomial but not linear. The natural generalization is to multidimensional schedules with lexicographic ordering as temporal succession. Multidimensional affine schedules, are, in a sense, equivalent to polynomial schedules, and are much easier to handle automatically. Furthermore, there is a strong connection between multidimensional schedules and loop nests, which allows one to prove that a static control program always has a multidimensional schedule. Roughly, a larger dimension indicates less parallelism. In the algorithm which is presented here, this dimension is computed dynamically, and is just sufficient for scheduling the source program. The algorithm lends itself to a “divide and conquer” strategy. The paper gives some experimental evidence for the applicability, performances and limitations of the algorithm.

445 citations

Book ChapterDOI
01 Jan 1996
TL;DR: The aim of this paper is to explain the importance of polytope and polyhedra in automatic parallelization, and shows that the semantics of parallel programs is best described geometrically, as properties of sets of integral points in n-dimensional spaces.
Abstract: The aim of this paper is to explain the importance of polytope and polyhedra in automatic parallelization. We show that the semantics of parallel programs is best described geometrically, as properties of sets of integral points in n-dimensional spaces, where n is related to the maximum nesting depth of DO loops. The needed properties translate nicely to properties of polyhedra, for which many algorithms have been designed for the needs of optimization and operation research. We show how these ideas apply to scheduling, placement and parallel code generation.

217 citations


Cited by
More filters
Proceedings ArticleDOI
01 May 1991
TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.
Abstract: This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The loop transformation algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular matrix transformations.The algorithm has been implemented in the SUIF (Stanford University Intermediate Format) compiler, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. Performance evaluation indicates that locality optimization is especially crucial for scaling up the performance of parallel code.

1,352 citations

Journal ArticleDOI
TL;DR: This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran, and describes the purpose of each transformation, how to determine if it is legal, and an example of its application.
Abstract: In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis.This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application.Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimizations that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as bibliographic reference for the details of each optimization. Readers are expected to be familiar with modern computer architecture and basic program compilation techniques.

946 citations

Proceedings ArticleDOI
07 Jun 2008
TL;DR: An automatic polyhedral source-to-source transformation framework that can optimize regular programs for parallelism and locality simultaneously simultaneously and is implemented into a tool to automatically generate OpenMP parallel code from C program sections.
Abstract: We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model -- far beyond what is possible by current production compilers. Unlike previous works, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high speedups for local and parallel execution on multi-cores over state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.

930 citations

Journal ArticleDOI
27 Jun 2005
TL;DR: SPIRAL generates high-performance code for a broad set of DSP transforms, including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms.
Abstract: Fast changing, increasingly complex, and diverse computing platforms pose central problems in scientific computing: How to achieve, with reasonable effort, portable optimal performance? We present SPIRAL, which considers this problem for the performance-critical domain of linear digital signal processing (DSP) transforms. For a specified transform, SPIRAL automatically generates high-performance code that is tuned to the given platform. SPIRAL formulates the tuning as an optimization problem and exploits the domain-specific mathematical structure of transform algorithms to implement a feedback-driven optimizer. Similar to a human expert, for a specified transform, SPIRAL "intelligently" generates and explores algorithmic and implementation choices to find the best match to the computer's microarchitecture. The "intelligence" is provided by search and learning techniques that exploit the structure of the algorithm and implementation space to guide the exploration and optimization. SPIRAL generates high-performance code for a broad set of DSP transforms, including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms. Experimental results show that the code generated by SPIRAL competes with, and sometimes outperforms, the best available human tuned transform library code.

853 citations

Journal ArticleDOI
TL;DR: Compilers for vector or multiprocessor computers must have certain optimization features to successfully generate parallel code to be able to operate on parallel systems.
Abstract: Compilers for vector or multiprocessor computers must have certain optimization features to successfully generate parallel code.

758 citations