Showing papers on "Loop fission published in 2000"

PDF

Open Access

Journal Article•DOI•

[...]

Alain Darte¹•Institutions (1)

01 Aug 2000

TL;DR: The goal of this paper is to study, from a theoretical point of view, several variants of the loop fusion problem -- identifying polynomially solvable cases and NP-complete cases -- and to make the link between these problems and some scheduling problems that arise from completely different areas.

...read moreread less

Abstract: Loop fusion is a program transformation that combines several loops into one. It is used in parallelizing compilers mainly for increasing the granularity of loops and for improving data reuse. The goal of this paper is to study, from a theoretical point of view, several variants of the loop fusion problem – identifying polynomially solvable cases and NP-complete cases – and to make the link between these problems and some scheduling problems that arise from completely different areas. We study, among others, the fusion of loops of different types, and the fusion of loops when combined with loop shifting.

...read moreread less

129 citations

Proceedings Article•DOI•

Tiling Imperfectly-nested Loop Nests

[...]

Nawaaz Ahmed¹, Nikolay Mateev¹, Keshav Pingali¹•Institutions (1)

Cornell University¹

01 Nov 2000

TL;DR: The key idea is to embed the iteration space of every statement in the imperfectly-nested loop nest into a special space called the product space which is tiled to produce the final code.

...read moreread less

Abstract: Tiling is one of the more important transformations for enhancing loca lity of reference in programs. Intuitively, tiling a set of loops achieves the effect of interleaving iterations of these loops. Tiling of perfectly-nested loop nests (which are loop nests in which all assignment statements are contained in the innermost loop) is well understood. In practice, many loop nests are imperfectly nested, so existing compilers use heuristics to try to find a sequence of transformations that convert such loop nests into perfectly-nested ones, but these heuristics do not always succeed. In this paper, we propose a novel approach to tiling imperfectly-nested loop nests. The key idea is to embed the iteration space of every statement in the imperfectly-nested loop nest into a special space called the product space which is tiled to produce the final code. We evaluate the effectiveness of this approach for dense numrical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks. No other single approach in the literature can tile all these codes automatically.

...read moreread less

95 citations

Proceedings Article•DOI•

Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

[...]

Nawaaz Ahmed¹, Nikolay Mateev¹, Keshav Pingali¹•Institutions (1)

Cornell University¹

08 May 2000

TL;DR: The key idea is to embed the iteration space of every statement in a loop nest into a special iteration space called the product space, which generalizes techniques like code sinking and loop fusion used in ad hoc ways in current compilers to produce perfectly-nested loop nests.

...read moreread less

Abstract: We present an approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of every statement in a loop nest into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop nest, so embedding generalizes techniques like code sinking and loop fusion that are used in ad hoc ways in current compilers to produce perfectly-nested loops from imperfectly-nested ones. In contrast to these ad hoc techniques however, our embeddings are chosen carefully to enhance locality. The product space is then transformed further to enhance locality, after which fully permutable loops are tiled, and code is generated. We evaluate the effectiveness of this approach for dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks.

...read moreread less

89 citations

Journal Article•DOI•

An efficient interior point method for sequential quadratic programming based optimal power flow

[...]

I.M. Nejdawi, K.A. Clements, P.W. Davis

01 Nov 2000-IEEE Transactions on Power Systems

TL;DR: In this paper, a new sequential quadratic programming algorithm for solving the optimal power flow problem is presented, which is structured with an outer linearization loop and an inner optimization loop.

...read moreread less

Abstract: This paper presents a new sequential quadratic programming algorithm for solving the optimal power flow problem. The algorithm is structured with an outer linearization loop and an inner optimization loop. The inner loop solves a relaxed reduced quadratic programming problem. Because constraint relaxation keeps the inner loop problem of small dimension, the algorithm is quite efficient. Its outer loop iteration counts are comparable to Newton power flow and the inner loops are efficient interior point iterations. Several IEEE test systems were run. The results indicate that both outer and inner loop iteration counts do not vary greatly with problem size.

...read moreread less

70 citations

Journal Article•DOI•

Loop Shifting for Loop Compaction

[...]

Alain Darte¹, Guillaume Huard¹•Institutions (1)

École normale supérieure de Lyon¹

01 Oct 2000-International Journal of Parallel Programming

TL;DR: This paper shows how loop shifting can be optimized so as to minimize both the length of the critical path and the number of dependences for loop compaction and shows that the second optimization is also polynomially solvable with a fast graph algorithm, variant of minimum-cost flow algorithms.

...read moreread less

Abstract: The idea of decomposed software pipelining is to decouple the software pipelining problem into a cyclic scheduling problem without resource constraints and an acyclic scheduling problem with resource constraints. In terms of loop transformation and code motion, the technique can be formulated as a combination of loop shifting and loop compaction. Loop shifting amounts to moving statements between iterations thereby changing some loop independent dependences into loop carried dependences and vice versa. Then, loop compaction schedules the body of the loop considering only loop independent dependences, but taking into account the details of the target architecture. In this paper, we show how loop shifting can be optimized so as to minimize both the length of the critical path and the number of dependences for loop compaction. The first problem is well-known and can be solved by an algorithm due to Leiserson and Saxe. We show that the second optimization (and the combination with the first one) is also polynomially solvable with a fast graph algorithm, variant of minimum-cost flow algorithms. Finally, we analyze the improvements obtained on loop compaction by experiments on random graphs.

...read moreread less

42 citations

Patent•

System and method for dynamically sizing hardware loops and executing nested loops in a digital signal processor

[...]

Sivagnanam Parthasarathy¹•Institutions (1)

STMicroelectronics¹

31 Aug 2000

TL;DR: In this paper, the authors present an apparatus for dynamically sizing a hardware loop that executes a plurality of instruction sequences forming a pluralityof instruction loops, for use in a digital signal processor, which includes an apparatus consisting of N pairs of loop start registers and loop end registers, each loop start register for storing a loop start address and each loop end register for storage a loop end address.

...read moreread less

Abstract: There is disclosed, for use in a digital signal processor, an apparatus for dynamically sizing a hardware loop that executes a plurality of instruction sequences forming a plurality of instruction loops. The apparatus comprises: 1) N pairs of loop start registers and loop end registers, each loop start register for storing a loop start address and each loop end register for storing a loop end address; 2) N comparators, each of the N comparators associated with one of the N pairs of loop start registers and loop end registers, wherein each of the N comparators compares a selected one of a first loop start address and a first loop end address to a fetch program counter value to detect one of a loop start hit and a loop end hit; and 3) fetch address generation circuitry for detecting the loop start hit and the loop end hit and fetching from an address in a program memory an instruction associated with one of the loop start hit and the loop end hit and loading the fetched instruction into the hardware loop.

...read moreread less

32 citations

Book Chapter•DOI•

Loop Termination Prediction

[...]

Timothy Sherwood¹, Brad Calder¹•Institutions (1)

University of California¹

16 Oct 2000

TL;DR: A simple hardware extension to existing prediction architectures called Loop Termination Prediction is presented, which captures the long regular repeating patterns of loops and a software technique called Branch Splitting is examined, which breaks loops with iteration counts above the detection of current predictors into smaller loops that may be effectively captured.

...read moreread less

Abstract: Deeply pipelined high performance processors require highly accurate branch prediction to drive their instruction fetch. However there remains a class of events which are not easily predictable by standard two level predictors. One such event is loop termination. In deeply nested loops, loop terminations can account for a significant amount of the mispredictions. We propose two techniques for dealing with loop terminations. A simple hardware extension to existing prediction architectures called Loop Termination Prediction is presented, which captures the long regular repeating patterns of loops. In addition, a software technique called Branch Splitting is examined, which breaks loops with iteration counts above the detection of current predictors into smaller loops that may be effectively captured. Our results show that for many programs adding a small loop termination buffer can reduce the missprediction rate by up to a difference of 2%.

...read moreread less

32 citations

Patent•

Linear power control loop

[...]

Jr. Normand T. Lemay¹, Brian T. Brunn¹, John Macconnell¹, Eric Sadowski¹, Eric W. Lofstad¹ - Show less +1 more•Institutions (1)

Itron¹

17 Feb 2000

TL;DR: In this article, a closed loop system that utilizes a nonlinear reference to control a power amplifier's output power in order to obtain a linear transfer function of dB per adjustment step of a reference input is presented.

...read moreread less

Abstract: The present invention presents a closed loop system that utilizes a non-linear reference to control a power amplifier's output power in order to obtain a linear transfer function of dB per adjustment step of a reference input. The closed loop system demonstrates that each non-linear stage/step in an automatic gain control system can create a linear closed loop system when using a non-linear reference. The closed loop system of the present invention eliminates the need for a linearization circuit for the system's power detector. The closed loop system may be used with most power amplifiers when linear control in terms of dB vs. adjustment setting of the input reference signal is desired. Output power in terms of dBms can be accurately set in linear steps where power control over a wide dynamic range is desired.

...read moreread less

31 citations

Patent•

Multiple execution of instruction loops within a processor without accessing program memory

[...]

Laurent Wojcieszak¹, Andrew Cofler¹•Institutions (1)

STMicroelectronics¹

02 May 2000

TL;DR: In this paper, a method of executing loops in a computer system is described, where a sequence of instructions held in program memory and a prefetch buffer which holds instructions fetched from the memory ready for supply to a decoder of the computer system.

...read moreread less

Abstract: A method of executing loops in a computer system is described. The computer system has a sequence of instructions held in program memory and a prefetch buffer which holds instructions fetched from the memory ready for supply to a decoder of the computer system. If the size of the loop to be executed is such that it can by holly contained within the prefetch buffer, this is detected and a lock is put on the prefetch buffer to retain the loop within it while the loop is executed a requisite number of times. This thus allows power to be saved and reduces the overhead on the memory access buffers. According to another aspect, loops can be “skipped” by holding a value of zero in the loop counter register.

...read moreread less

30 citations

Journal Article•DOI•

A unified framework for optimizing locality, parallelism, and communication in out-of-core computations

[...]

Mahmut Kandemir¹, Alok Choudhary², J. Ramanujam³, Meenakshi A. Kandaswamy⁴•Institutions (4)

Pennsylvania State University¹, Northwestern University², Louisiana State University³, Intel⁴

01 Jul 2000-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A unified framework that optimizes out-of-core programs by exploiting locality and parallelism, and reducing communication overhead, and extending the base algorithm to work with file layout constraints and show how it is useful for optimizing programs that consist of multiple loop nests.

...read moreread less

Abstract: This paper presents a unified framework that optimizes out-of-core programs by exploiting locality and parallelism, and reducing communication overhead. For out-of-core problems where the data set sizes far exceed the size of the available in-core memory, it is particularly important to exploit the memory hierarchy by optimizing the I/O accesses. We present algorithms that consider both iteration space (loop) and data space (file layout) transformations in a unified framework. We show that the performance of an out-of-core loop nest containing references to out-of-core arrays can be improved by using a suitable combination of file layout choices and loop restructuring transformations. Our approach considers array references one-by-one and attempts to optimize each reference for parallelism and locality. When there are references for which parallelism optimizations do not work, communication is vectorized so that data transfer can be performed before the innermost loop. Results from hand-compiles on IBM SP-2 and Inter Paragon distributed-memory message-passing architectures show that this approach reduces the execution times and improves the overall speedups. In addition, we extend the base algorithm to work with file layout constraints and show how it is useful for optimizing programs that consist of multiple loop nests.

...read moreread less

28 citations

Patent•

Method and apparatus for modulo scheduled loop execution in a processor architecture

[...]

Wen-mei W. Hwu, Matthew C. Merten

01 Dec 2000

TL;DR: In this paper, a processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active is presented.

...read moreread less

Abstract: A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution. The basic design of the invention involves including a plurality of buffers for storing loop instructions, each of which is associated with an instruction decoder and its respective functional unit, in the dispatch stage of a processor. Control logic is used to receive loop setup parameters and to control the selective issue of instructions from the buffers to the functional units.

...read moreread less

Proceedings Article•DOI•

Transients in reconfigurable control loops

[...]

Gyula Simon, Tamas Kovacshazy¹, Gábor Péceli¹•Institutions (1)

Budapest University of Technology and Economics¹

01 May 2000

TL;DR: In this paper, transient management techniques are investigated in a scenario where the reconfiguration or replacement of a forward-loop controller is required due to changes in the environment or in the plant.

...read moreread less

Abstract: In this paper transient management techniques are investigated in a scenario where the reconfiguration or replacement of a forward-loop controller is required due to changes in the environment or in the plant. Such a change in the closed control loop may have undesirable transient effects, which may degrade the performance of the controlled system. Since the transient cancellation and reduction schemes used in open-loop systems can not be used here, new solutions are proposed for run-time transient handling.

...read moreread less

Proceedings Article•DOI•

An investigation of feedback guided dynamic scheduling of nested loops

[...]

David Hancock¹, J.M. Bull, Rupert W. Ford, T. L. Freeman•Institutions (1)

University of Manchester¹

21 Aug 2000

TL;DR: Three alternative ways of scheduling nested loops are described; two are based on reducing the nested loops to a single loop and applying one-dimensional techniques; the third addresses the multidimensionality of the nested loop directly.

...read moreread less

Abstract: In previous papers (J.M. Bull, 1998; J.M. Bull et al., 1996; R.W. Ford et al., 1994) feedback guided loop scheduling algorithms have been shown to be very effective for certain loop scheduling problems. In particular they perform well for problems that involve a sequential outer loop and a parallel inner loop, and timing information gathered during one execution of the parallel inner loop can be used to inform the scheduling of the subsequent execution of this loop. The authors consider the extension of these feedback guided scheduling algorithms to the more important case of nested parallel loops, again within a sequential outer loop. We describe three alternative ways of scheduling nested loops; two are based on reducing the nested loops to a single loop and applying one-dimensional techniques; the third addresses the multidimensionality of the nested loops directly.

...read moreread less

Book Chapter•DOI•

A Framework for Loop Distribution on Limited On-Chip Memory Processors

[...]

Lei Wang¹, Waibhav Tembe¹, Santosh Pande¹•Institutions (1)

University of Cincinnati¹

25 Mar 2000

TL;DR: This work proposes a framework for analyzing the flow of values and their re-use in loop nests to minimize data traffic under the constraints of limited on-chip memory capacity and dependences and develops a greedy algorithm which traverses the program dependence graph (PDG) to group statements together under the same loop nest legally.

...read moreread less

Abstract: This work proposes a framework for analyzing the flow of values and their re-use in loop nests to minimize data traffic under the constraints of limited on-chip memory capacity and dependences. Our analysis first undertakes fusion of possible loop nests intra-procedurally and then performs loop distribution. The analysis discovers the closeness factor of two statements which is a quantitative measure of data traffic saved per unit memory occupied if the statements were under the same loop nest over the case where they are under different loop nests. We then develop a greedy algorithm which traverses the program dependence graph (PDG) to group statements together under the same loop nest legally. The main idea of this greedy algorithm is to transitively generate a group of statements that can legally execute under a given loop nest that can lead to a minimum data traffic. We implemented our framework in Petit, a tool for dependence analysis and loop transformations. We show that the benefit due to our approach results in eliminating as much as 30 % traffic in some cases improving overall completion time by a 23.33 % for processors such as TI's TMS320C5x.

...read moreread less

Proceedings Article•

A loop transformation approach for combined parallelization and data transfer and storage optimization

[...]

Koen Danckaert, Francky Catthoor, Hugo De Man

01 Jan 2000

TL;DR: A new loop transformation approach to combine parallelization and data transfer and storage optimization for embedded multimedia applications using an extended polytope model, with an exact mathematical description of all operations and dependencies.

...read moreread less

Abstract: We show a new loop transformation approach to combine parallelization and data transfer and storage optimization for embedded multimedia applications. Our methodology makes use of an extended polytope model, with an exact mathematical description of all operations and dependencies. For the data transfer and storage exploration, we use a two step approach, consisting of a polytope placement step and an ordering step. We will show that an early parallelization has to be done between these two steps, in order to achieve a powerful combination of storage optimization and

...read moreread less

Book Chapter•DOI•

Techniques for Effectively Exploiting a Zero Overhead Loop Buffer

[...]

Gang-Ryung Uh¹, Yuhong Wang², David Whalley², Sanjay Jinturkar¹, Chris Burns¹, Vincent Cao¹ - Show less +2 more•Institutions (2)

Alcatel-Lucent¹, Florida State University²

25 Mar 2000

TL;DR: This paper describes strategies for generating code to effectively use a Zero Overhead Loop Buffer and finds that many common improving transformations used by optimizing compilers to improve code on conventional architectures can be exploited to allow more loops to be placed in a ZOLB.

...read moreread less

Abstract: A Zero Overhead Loop Buffer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buffer can be viewed as a compiler managed cache that contains a sequence of instructions that will be executed a specified number of times. Unlike loop unrolling, a loop buffer can be used to minimize loop overhead without the penalty of increasing code size. In addition, a ZOLB requires relatively little space and power, which are both important considerations for most DSP applications. This paper describes strategies for generating code to effectively use a ZOLB. The authors have found that many common improving transformations used by optimizing compilers to improve code on conventional architectures can be exploited (1) to allow more loops to be placed in a ZOLB, (2) to further reduce loop overhead of the loops placed in a ZOLB, and (3) to avoid redundant loading of ZOLB loops. The results given in this paper demonstrate that this architectural feature can often be exploited with substantial improvements in execution time and slight reductions in code size.

...read moreread less

Proceedings Article•DOI•

Memory system energy (poster session): influence of hardware-software optimizations

[...]

G. Esakkimuthu¹, N. Vijaykrishnan¹, Mahmut Kandemir¹, Mary Jane Irwin¹•Institutions (1)

Pennsylvania State University¹

01 Aug 2000

TL;DR: This paper provides a quantitative comparison and evaluation of the interaction of two hardware cache optimization mechanisms and three widely used compiler optimization techniques and shows that hardware optimization becomes more critical for on-chip cache energy reduction when executing optimized codes.

...read moreread less

Abstract: Memory system usually consumes a significant amount of energy in many battery-operated devices. In this paper, we provide a quantitative comparison and evaluation of the interaction of two hardware cache optimization mechanisms (block buffering and sub-banking) and three widely used compiler optimization techniques (linear loop transformation, loop tiling, and loop unrolling). Our results show that the pure hardware optimizations (eight block buffers and four sub-banks in a 4K, 2-way cache) provided up to 4% energy saving, with an average saving of 2% across all benchmarks. In contrast, the pure software optimization approach that uses all three compiler optimizations, provided at least 23% energy saving, with an average of 62%. However, a closer observation reveals that hardware optimization becomes more critical for on-chip cache energy reduction when executing optimized codes.

...read moreread less

Tiling Imperfectly-nested Loop Nests (REVISED)

[...]

Nawaaz Ahmed, Nikolay Mateev, Keshav Pingali

31 Jan 2000

TL;DR: The key idea is to embed the iteration space of every statement in the imperfectly-nested loop nest into a special space called the product space, which is constrained so that the resulting product space can be legally tiled.

...read moreread less

Abstract: Tiling is one of the more important transformations for enhancing locality of reference in programs. Tiling of perfectly-nested loop nests (which are loop nests in which all assignment statements are contained in the innermost loop) is well understood. In practice, most loop nests are imperfectly-nested, so existing compilers heuristically try to find a sequence of transformations that convert such loop nests into perfectly-nested ones but not always succeed. In this paper, we propose a novel approach to tiling imperfectly-nested loop nests. The key idea is to embed the iteration space of every statement in the imperfectly-nested loop nest into a special space called the product space. The set of possible embeddings is constrained so that the resulting product space can be legally tiled. From this set we choose embeddings that enhance data reuse. We evaluate the effectiveness of this approach for dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks. No other single approach in the literature can tile all these codes automatically.

...read moreread less

Journal Article•DOI•

A Loop Transformation Algorithm for Communication Overlapping

[...]

Kazuaki Ishizaki, Hideaki Komatsu, Toshio Nakatani

01 Apr 2000

TL;DR: A compiler algorithm that automatically determines the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation is presented.

...read moreread less

Abstract: Overlapping communication with computation is a well-known approach to improving performance. Previous research has focused on optimizations performed by the programmer. This paper presents a compiler algorithm that automatically determines the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. The algorithm avoids generating redundant communication by providing a framework for combining information on data dependence, communication, and reuse. It also describes a method of generating messages to exchange data between processors for tiled loops on distributed memory machines. The algorithm has been implemented in our High Performance Fortran (HPF) compiler, and experimental results have shown its effectiveness on distributed memory machines, such as the RISC System/6000 Scalable POWERparallel System. This paper also discusses the architectural problems of efficient optimization.

...read moreread less

Book Chapter•DOI•

Feedback Guided Scheduling of Nested Loops

[...]

T. L. Freeman¹, David Hancock¹, J. Mark Bull², Rupert W. Ford¹•Institutions (2)

University of Manchester¹, University of Edinburgh²

18 Jun 2000

TL;DR: Four feedback guided algorithms for scheduling nested loops are described and evaluated and the performances of the algorithms are evaluated on a set of synthetic benchmarks.

...read moreread less

Abstract: In earlier papers ([2], [3], [6]) feedback guided loop scheduling algorithms have been shown to be very effective for certain loop scheduling problems which involve a sequential outer loop and a parallel inner loop and for which the workload of the parallel loop changes only slowly from one execution to the next. In this paper the extension of these ideas the case of nested parallel loops is investigated. We describe four feedback guided algorithms for scheduling nested loops and evaluate the performances of the algorithms on a set of synthetic benchmarks.

...read moreread less

Proceedings Article•DOI•

Model estimation and controller reduction: dual closed loop identification problems

[...]

I.D. Laundau, Alireza Karimi

12 Dec 2000

TL;DR: In this article, the plant model identification in closed loop using closed loop output error identification algorithms and the direct estimation in closed-loop of a reduced-order controller feature a duality character.

...read moreread less

Abstract: Algorithms for direct controller reduction by identification in closed loop have been previously proposed (Landau and Karimi, 2000, and Karimi and Landau, 2000). In this paper it is shown that the plant model identification in closed loop using closed loop output error identification algorithms and the direct estimation in closed loop of a reduced order controller feature a duality character. Basic schemes, algorithms and properties of the algorithms can be directly obtained by interchanging the plant model and the controller. In the last part of the paper the interaction between plant model identification in closed loop and direct controller reduction is emphasized.

...read moreread less

Compiler algorithms for efficient use of memory systems

[...]

Yonghong Song, Zhiyuan Li

01 Jan 2000

TL;DR: A memory cost model is developed to characterize the cache reuse and an execution cost model to estimate the execution time for imperfectly-nested loops so that the utilization of cache memories and the translation lookaside buffer is enhanced.

...read moreread less

Abstract: This thesis investigates compiler algorithms to transform program and data to utilize efficiently the underlying memory systems. Despite extensive studies for locality enhancement for perfectly-nested loops, little work has been done for imperfectly-nested loops. In this thesis, two such techniques are presented. The first technique is to the imperfectly-nested loops so that the utilization of cache memories and the translation lookaside buffer (TLB) is enhanced. We develop a memory cost model to characterize the cache reuse and an execution cost model to estimate the execution time. Array duplication, which helps remove false dependences, is applied whenever beneficial. Speculative execution is used to overcome premature exits for certain applications. By tiling the outer loop, which encloses several perfectly-nested loops, the locality across different inner loops as well as the outer loop itself is exploited. The second technique is to contract the temporary storage used in computation without changing the program's semantics. Enabled by loop shifting and loop fusion, the memory reduction technique can enhance locality because of two factors, namely the reduced reference window size after fusion and the reduced cache pressure after array contraction. We formulate the memory reduction problem as a graph-based problem. Transformed to a network flow problem, it is polynomial-time solvable. Both techniques are implemented in a research compiler, Panaroma. The experimental results demonstrate how effective our techniques can be both in boosting cache utilization and in performance improvement.

...read moreread less

Journal Article•DOI•

Precision Positioning Which Contains Flexible Mechanical System. Fusion of Semi-closed and Full-closed Loop Control.

[...]

Fumitoshi Sakai, Masatoshi Hikizu, Yoshitsugu Kamiya, Hiroaki Seki

01 Jan 2000-Journal of The Japan Society for Precision Engineering

TL;DR: In this article, a method of fusing a semi-closed control into a full-closed loop control is presented, and this technique solves mutual faults which a semi closed and a full closed control have and it gets a high positional accuracy.

...read moreread less

Abstract: Generally, for the position control of industrial machines, a semi-closed loop control which uses only the information of motor's angle is adopted, since it contributes to the stability of the control system. In this method, however, precise positioning cannot be attained because of the some non-linear elements in the driving mechanism. Also, by the full-closed loop control which uses only the information of load's angle, it is difficult to keep the stability because of containing a flexibility between the sensor and the actuator, so it is necessary to control the system at low gain, and the positional accuracy is declined. Therefore, this study shows a method of fusing a semi-closed loop control into a full-closed loop control, and this technique solves mutual faults which a semi-closed and a full-closed loop control have and it gets a high positional accuracy. Also, the effectivity about this control technique is shown by the experiment.

...read moreread less

Journal Article•DOI•

Open and Closed Loop Process Identification by a Phase Locked Loop Identifier Module

[...]

J. Crowe¹, Michael A. Johnson²•Institutions (2)

Hoffmann-La Roche¹, University of Strathclyde²

01 Jun 2000-IFAC Proceedings Volumes

TL;DR: In this paper, the authors report a more flexible and more accurate method of finding system frequency response data including the system ultimate data. But the method is limited to three term controller tuning in the process industries.

...read moreread less

Book Chapter•DOI•

Advanced Scalarization of Array Syntax

[...]

Gerald Roth¹•Institutions (1)

Gonzaga University¹

25 Mar 2000

TL;DR: Experimental results show that the analysis strategy presented can significantly improve the runtime performance of compiled code, while at the same time improving the performance of the compiler itself.

...read moreread less

Abstract: One task of all Fortran 90 compilers is to scalarize the array syntax statements of a program into equivalent sequential code. Most compilers require multiple passes over the program source to ensure correctness of this translation, since their analysis algorithms only work on the scalarized form. These same compilers then make additional subsequent passes to perform loop optimizations such as loop fusion. In this paper we discuss a strategy that is capable of making advanced scalarization and fusion decisions at the array level. We present an analysis strategy that supports our advanced scalarizer, and we describe the benefits of this methodology compared to the standard practice. Experimental results show that our strategy can significantly improve the runtime performance of compiled code, while at the same time improving the performance of the compiler itself.

...read moreread less

Application of high-level memory size estimation for guidance of loop transformations in multimedia design

[...]

Per Gunnar Kjeldsberg, Francky Catthoor, Einar J. Aas

01 Jan 2000

TL;DR: This paper demonstrates how a novel technique for highlevel memory requirement estimation can be used in system level synthesis for data-dominated multimedia applications using a polyhedral description of partitioned arrays and their dependencies.

...read moreread less

Abstract: ABSTRACT In this paper, we demonstrate how a novel technique for highlevel memory requirement estimation can be used in system level synthesis for data-dominated multimedia applications. Using a polyhedral description of partitioned arrays and their dependencies, guiding hints for the loop ordering are presented to the designer. Our key contribution consists of estimates on the upper and lower bounds of the memory size requirement with a partially fixed execution ordering. These are used in the early system design trajectory to find an implementation with low memory requirement. The methodology is demonstrated using a representative multimedia application.

...read moreread less

Proceedings Article•DOI•

A new model of exploiting loop parallelization using knowledge-based techniques

[...]

Chao-Tung Yang, Shian-Shyong Tseng, Chang-Jiun Tsai, Cheng-Der Chuang, Sun-Wen Chuang - Show less +1 more

04 Jul 2000

TL;DR: A new model of exploiting loop parallelization by using knowledge-based techniques is first proposed, which can achieve higher speedup on parallelizing compilers and for system maintenance and extensibility, is obviously superior to others.

...read moreread less

Abstract: We concentrate on three fundamental phases, data dependence testing, parallel loop transformation, and parallel loop scheduling, for loop parallelization in parallelizing compilers, running on multiprocessor systems. A new model of exploiting loop parallelization by using knowledge-based techniques is first proposed. The knowledge-based approach integrates existing data dependence tests, loop transformations and loop schedules, to make good use of their abilities for extracting more parallelism. Three rule-based systems, called the K-Test, IPLS and KPLT, are then developed by repertory grid analysis and an attribute ordering table to construct the knowledge base, respectively. These systems can choose an appropriate test, transform and schedule, then apply the resulting methods to perform loop parallelization and gain a high speedup rate. For example, the KPLT can choose the appropriate loop transformations to reorder the execution of statements and loop iterations for parallelization. Unlike the previous researches that must use the one-pass approach, we introduce the idea of multipass which may explore more parallelism of loops. Experimental results show that our new model can achieve higher speedup on parallelizing compilers. Furthermore, for system maintenance and extensibility, our approach is obviously superior to others.

...read moreread less

Patent•

Test loop formation method for fixed digital telecommunication networks using bit error rate measurement

[...]

Kesler Steffen¹, Prautsch Daniel¹•Institutions (1)

Deutsche Telekom¹

05 Oct 2000

TL;DR: In this article, the interface converter makes available the signal necessary for the loop formation and bit pattern recognition, that the circuit loop formation recognizes or checks, and the transferred bit pattern places or splits a loop depending on pattern.

...read moreread less

Abstract: The interface converter (2) makes available the signal necessary for the loop formation and the bit pattern recognition, that the circuit loop formation (3) recognizes or checks. The transferred bit pattern places or splits a loop depending on pattern. An Independent claim is also included for a test loop forming device.

...read moreread less

Patent•

Troubleshooting method in digital transmission system, involves transferring instruction and measurement sequences separately and cyclically to loop device at output of network termination for test loop execution

[...]

Wolfgang Baschant, Alexander Herlitz, Walter Haas, Johann Rottmann, Robert Sturm - Show less +1 more

23 Feb 2000

TL;DR: In this article, a troubleshooting device transfers instruction sequences and measurement sequences separately and cyclically through one of the useful channels of digital transmission system, to a loop device (RTL) arranged at the output of network termination (NT).

...read moreread less

Abstract: A troubleshooting device transfers instruction sequences and measurement sequences separately and cyclically through one of the useful channels of digital transmission system, to a loop device (RTL) arranged at the output of network termination (NT). The instruction sequence includes one non-confusable signature for identifying the loop device and final instruction for executing the test loop. An independent claim is also included for trouble shooting device.

...read moreread less

Proceedings Article•DOI•

A scalable loop optimization approach for scalable DSP processors

[...]

Jian Wang¹, Bogong Su, Erh-Wen Hu•Institutions (1)

Nortel¹

05 Jun 2000

TL;DR: This paper proposes the possibility of reuse of the existing optimized DSP code on a scalable high-performance VLIW DSP processor by first performing a loop alignment transformation on the source level and then reuse the existing optimize loop code on the assembly level.

...read moreread less

Abstract: This paper proposes the possibility of reuse of the existing optimized DSP code on a scalable high-performance VLIW DSP processor. Since loops are the critical paths in most DSP applications, we focus on issues related to loop optimization. In our approach, we first perform a loop alignment transformation on the source level; we then reuse the existing optimized loop code on the assembly level. The approach is highly portable because it is independent of DSP hardware details. It can be used directly by a DSP programmer on the source level and/or by a DSP compiler designer to implement independent optimization modules.

...read moreread less