scispace - formally typeset
Search or ask a question
Topic

Loop fission

About: Loop fission is a research topic. Over the lifetime, 833 publications have been published within this topic receiving 20108 citations.


Papers
More filters
Proceedings ArticleDOI
01 May 1991
TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.
Abstract: This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The loop transformation algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular matrix transformations.The algorithm has been implemented in the SUIF (Stanford University Intermediate Format) compiler, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. Performance evaluation indicates that locality optimization is especially crucial for scaling up the performance of parallel code.

1,352 citations

Book
16 Jun 1995
TL;DR: This book discusses Programming Language Features, Data Dependence, Dependence System Solvers, and Run-time Dependence Testing for High Performance Systems.
Abstract: 1. High Performance Systems. An Example Program: Matrix Multiplication. Structure of a Compiler. 2. Programming Language Features. Languages for High Performance. Sequential and Parallel Loops. Roundoff Error. 3. Basic Graph Concepts. Sets, Tuples, Logic. Graphs. Control Dependence. 4. Review of Linear Algebra. Real Vectors and Matrices. Integer Matrices and Lattices. Linear System of Equations. System of Integer Equations. Systems of Linear Inequalities. Systems of Integer Linear Inequalities. Extreme Values of Affine Functions. 5. Data Dependence. Data Dependence in Loops. Data Dependence in Conditionals. Data Dependence in Parallel Loops. Program Dependence Graph. 6. Scalar Analysis with Factored Use-Def Chains. Constructing Factored Use-Def Chains. FUD Chains for Arrays. Finding All Reaching Definitions. Implicit References in FUD Chains. InductionVariables Using FUD Chains. Constant Propagation with FUD Chains. Data Dependence for Scalars. 7. Data Dependence Analysis for Arrays. Building the Dependence System. Dependence System Solvers. General Solver. Summary of Solvers. Complications. Run-time Dependence Testing. 8. Other Dependence Problems. Array Region Analysis. Pointer Analysis. I/O Dependence. Procedure Calls. Interprocedural Analysis. 9. Loop Restructuring. Simpile Transformations. Loop Fusion. Loop Fission. Loop Reversal. Loop Interchanging. Loop Skewing. Linear Loop Transformations. Strip-Mining. Loop Tiling. Other Loop Transformations. Interprocedural Transformations. 10. Optimizing for Locality. Single Reference to Each Array. Multiple References. General Tiling. Fission and Fusion for Locality. 11. Concurrency Analysis. Code for Concurrent Loops. Concurrency from Sequential Loops. Concurrency from Parallel Loops. Nested Loops. Roundoff Error. Exceptions and Debuggers. 12. Vector Analysis. Vector Code. Vector Code from Sequential Loops. Vector Code from Forall Loops. Nested Loops. Roundoff Error, Exceptions, and Debuggers. Multivector Computers. 13. Message-Passing Machines. SIMD Machines. MIMD Machines. Data Layout. Parallel Code for Array Assignment. Remote Data Access. Automatic Data Layout. Multiple Array Assignments. Other Topics. 14. Scalable Shared-Memory Machines. Global Cache Coherence. Local Cache Coherence. Latency Tolerant Machines. Glossary. References. Author Index. Index. 0805327304T04062001

1,344 citations

Journal ArticleDOI
TL;DR: An upper bound of the difference between both loops is derived, which shows that the approximation of the continuous state-feedback loop by the event-based control loop can be made arbitrarily tight by appropriately choosing the threshold parameter of the event generator.

994 citations

Proceedings ArticleDOI
B. Moore1
01 Dec 1975
TL;DR: In this article, a characterization of all closed loop eigenvector sets which can be obtained with a given set of distinct closed-loop eigenvalues using state feedback is given.
Abstract: A characterization is given for the class of all closed loop eigenvector sets which can be obtained with a given set of distinct closed loop eigenvalues using state feedback. It is shown, furthermore, that the freedom one has in addition to specifying the closed loop eigenvalues is precisely this: to choose one set of closed loop eigenvectors from this class. Included in the proof of this result is an algorithm for computing the matrix of feedback gains which gives the chosen closed loop eigenvalues and eigenvectors. A design scheme based on these results is presented which gives the designer considerable freedom to choose the distribution of the modes among the output components. One interesting feature is that the distribution of a mode among the output components can be varied even if the mode is not controllable.

583 citations

Journal ArticleDOI
TL;DR: This article presents compiler optimizations to improve data locality based on a simple yet accurate cost model and finds performance improvements were difficult to achieve, but improved several programs.
Abstract: In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In the this article, we present compiler optimizations to improve data locality based on a simple yet accurate cost model. The model computes both temporal and spatial reuse of cache lines to find desirable loop organizations. The cost model drives the application of compound transformations consisting of loop permutation, loop fusion, loop distribution, and loop reversal. To validate our optimization strategy, we implemented our algorithms and ran experiments on a large collection of scientific programs and kernels. Experiments illustrate that for kernels our model and algorithm can select and achieve the best loop structure for a nest. For over 30 complete applications, we executed the original and transformed versions and simulated cache hit rates. We collected statistics about the inherent characteristics of these programs and our ability to improve their data locality. To our knowledge, these studies are the first of such breadth and depth. We found performance improvements were difficult to achieve bacause benchmark programs typically have high hit rates even for small data caches; however, our optimizations significanty improved several programs.

566 citations


Network Information
Related Topics (5)
Compiler
26.3K papers, 578.5K citations
79% related
Distributed algorithm
20.4K papers, 548.1K citations
78% related
Cache
59.1K papers, 976.6K citations
77% related
Data structure
28.1K papers, 608.6K citations
75% related
Scalability
50.9K papers, 931.6K citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20234
20229
20202
20183
20177
201618