Showing papers on "Loop fission published in 1989"

PDF

Open Access

Journal Article•DOI•

[...]

James C. Dehnert, Peter Y. Hsu¹, Joseph P. Bratt•Institutions (1)

01 Apr 1989

TL;DR: The Cydra TM 5 architecture adds unique support for overlapping successive iterations of a loop to a very long instruction word (VLIW) base, allowing highly parallel loop execution for a much larger class of loops than can be vectorized.

...read moreread less

Abstract: The CydraTM 5 architecture adds unique support for overlapping successive iterations of a loop to a very long instruction word (VLIW) base. This architecture allows highly parallel loop execution for a much larger class of loops than can be vectorized, without requiring the unrolling of loops usually used by compilers for VLIW machines. This paper discusses the Cydra 5 loop scheduling model, the special architectural features which support it, and the loop compilation techniques used to take full advantage of the architecture.

...read moreread less

147 citations

Proceedings Article•DOI•

Loop Optimization in Register-Transfer Scheduling for DSP-Systems

[...]

G. Goossens¹, Joos Vandewalle¹, H. De Man¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jun 1989

TL;DR: An iterative loop-folding procedure, implemented in the CATHEDRAL II compiler, is presented, which may significantly improve the utilization of parallel hardware, available in a data path.

...read moreread less

Abstract: In this paper, we discuss a control-flow transformation called loop folding, during the scheduling of register-transfer code for DSP-systems. Loop folding is functionally equivalent to data-path pipelining. An iterative loop-folding procedure, implemented in the CATHEDRAL II compiler, is presented. This technique may significantly improve the utilization of parallel hardware, available in a data path.

...read moreread less

141 citations

Patent•

Method for simulating a logic system

[...]

Zakwan Shaar¹•Institutions (1)

International Computers Limited¹

13 Mar 1989

TL;DR: In this paper, a logic simulator has a time loop with a number of time slots into which events are scheduled, such that event times corresponding to different cycles around the loop may be simultaneously present on the loop.

...read moreread less

Abstract: A logic simulator has a time loop with a number of time slots into which events are scheduled. The events are wrapped around the loop, so that event times corresponding to different cycles around the loop may be simultaneously present on the loop. This allows a small loop size to be used, which improves performance. Preferably, the loop size is a prime number. If a complete cycle of the loop is made without finding any non-empty slots a jump is made to the next event time, so as to speed up the processing. In one described embodiment, the loop size is static, while in a second described embodiment the loop size is dynamically varied to minimize the insertion of events with different event times into the same slot.

...read moreread less

57 citations

Patent•

Method and apparatus for implementing an iterative program loop by comparing the loop decrement with the loop value

[...]

Leslie D. Kohn¹•Institutions (1)

Intel¹

14 Feb 1989

TL;DR: In this paper, a special purpose instruction is used to reduce the program overhead associated with conditional branching at the end of a program loop by comparing a loop counter with a decrement value.

...read moreread less

Abstract: A method and apparatus for providing program loop control in a data processor employs a special purpose instruction that substantially reduces the program overhead associated with conditional branching at the end of a program loop. The instruction first compares a loop counter with a decrement value. If the loop counter has counted down, a loop condition code, which is stored in a dedicated register bit, is cleared. Otherwise, the loop condition code remains set to indicate that further iterations of the loop are required. The decremented value of the loop counter is then stored in a loop counter register. In parallel with decrementing of the loop counter, a conditional branch is executed based on the value of the loop condition code set in the immediately previous iteration of the loop. If the loop condition code is cleared, i.e. if the loop has been completed, program control proceeds to the instruction following the loop after execution of the next instruction in sequence. conversely, if the loop condition code is set, program control returns to the branch address, i.e. the beginning of the loop, after execution of the next instruction in sequence. All of the operations of the present invention are performed within a single instruction cycle.

...read moreread less

37 citations

Patent•

Data flow processor which combines packets having same identification and destination and synchronizes loop variables for detecting processing loop termination

[...]

Shinichi Yoshida

23 Jan 1989

TL;DR: In this article, a data flow type information processor includes a program storing portion, a data pair producing portion and a processing portion, and a function for synchronizing with all of loop variables.

...read moreread less

Abstract: A data flow type information processor includes a program storing portion, a data pair producing portion and a processing portion. In the data flow type information processor in executing a data flow program having a loop structure, a function for synchronizing with all of loop variables, that is, function for assuring that the value of all of the loop variables are determined in a loop execution stage to be considered, is applied to a group of instruction information for determining a loop termination.

...read moreread less

33 citations

Patent•

Horizontal computer having register multiconnect for execution of a loop with overlapped code

[...]

Bantwal R. Rau¹, Ross A. Towle¹•Institutions (1)

Hewlett-Packard¹

24 Apr 1989

TL;DR: In this paper, the authors present a horizontal computer for execution of an instruction loop with overlapped code, which includes a plurality of processors, a multiconnect unit for storing operands, an instruction unit for specifying address offsets and operations to be performed by the processors, and an invariant address unit for combining the address offsets with a modifiable pointer to form source and destination addresses.

...read moreread less

Abstract: A horizontal computer for execution of an instruction loop with overlapped code. The computer includes a plurality of processors, a multiconnect unit for storing operands for the processors, an instruction unit for specifying address offsets and operations to be performed by the processors, and an invariant address unit for combining the address offsets with a modifiable pointer to form source and destination addresses in the multiconnect unit. The instruction unit enables different ones of the processors as a function of which iteration of the loop is being executed, for example by means of processor control circuitry or by selectively providing instructions to the processors, so that different operations are performed during different iterations.

...read moreread less

20 citations

Patent•

Carrier aided code tracking loop

[...]

Vaughn L. Mower¹•Institutions (1)

Unisys¹

20 Nov 1989

TL;DR: In this article, a fast acquisition coherent code tracking loop for use in direct sequence spread spectrum systems is provided with an embedded frequency offset loop with a pair of multipliers, one of which is coupled to the carrier tracking loop through a scaling circuit and the second multiplier is coupled with the output of the highly stable VCO of the carrier tracker to provide extremely fast phase acquisition of the received PN code.

...read moreread less

Abstract: A fast acquisition coherent code tracking loop for use in direct sequence spread spectrum systems is provided with an embedded frequency offset loop. The frequency offset loop in the code tracking loop is provided with a pair of multipliers, one of which is coupled to the carrier tracking loop through a scaling circuit and the second multiplier is coupled to the output of the highly stable VCO of the carrier tracking loop to provide extremely fast phase acquisition of the received PN code and very high frequency stability of the code tracking loop.

...read moreread less

20 citations

Journal Article•DOI•

Double loop iterative strategies for hierarchical control of industrial processes

[...]

M. Bryds¹, P.D. Roberts², M. M. Badi², I. C. Kokkinos², Normah Abdullah³ - Show less +1 more•Institutions (3)

University of Birmingham¹, City University London², National University of Malaysia³

01 Sep 1989-Automatica

TL;DR: It is shown that this model based double iterative loop strategy has an important practical advantage in that it reduces the required number of set point changes to real subprocesses in order to achieve optimality.

...read moreread less

15 citations

Proceedings Article•DOI•

Parallel processor balance through loop spreading

[...]

Y. Wu, Ted G. Lewis¹•Institutions (1)

Oregon State University¹

01 Aug 1989

TL;DR: This work shows how the method keeps the performance of the matrix multiplication and a simplex algorithm from decreasing as the size of input changes, and shows no performance drop when N changes.

...read moreread less

Abstract: When the number of processors P is less than the number of tasks N in a parallel loop, the loop has to be executed in ⌈N/P⌉ rounds and the last round executes only (N mod P) tasks. In many cases, in the last round all but a few processors are idle, which causes a significant drop in performance. This performance drop becomes more and more detrimental as the number of processors increases. Loop spreading is a technique for restructuring parallel loops so as to balance parallel tasks on multiple processors. A spread loop runs at least as fast as the non-spread loop even when N mod P = 0, and shows no performance drop when N changes. We show how the method keeps the performance of the matrix multiplication and a simplex algorithm from decreasing as the size of input changes.

...read moreread less

6 citations

Patent•

Method and apparatus for efficient loop constructs in hardware and microcode

[...]

Mark D. Atkins¹, Agnes Yee Ngai¹, Alfred T. Rundle¹•Institutions (1)

IBM¹

21 Oct 1989

TL;DR: In this paper, the authors propose an approach to reduce the overhead of loop execution by placing the decrement, compare, and branch-to-top instructions in hardware, reducing the number of instructions in the loop and speeding loop execution.

...read moreread less

Abstract: Method and apparatus to avoid the code space and time overhead of the software-loop. Loops (repeatedly executed blocks of instructions) are often used in software and microcode. Loops may be employed for array manipulation, storage initialization, division and square-root interpretation, and micro-interpreta tion of instructions with variable-length operands. Software creates loops by keeping an iteration count in a register or in memory. During each iteration of the code loop, software decrements the count, and then branches to the "top" of the loop is the count remains nonzero. This apparatus puts the decrement, compare, and branch- to-top into hardware, reducing the number of instructions in the loop and speeding loop execution. Hardware further speeds loop execution by eliminating the wait for the branch to the top-of-loop instruction. That is, it prefetches the top-of-loop instruction near the bottom of the loop. The loop may be initialized for a fixed iteration count, or can accept a variable count in the iteration count register. The apparatus consists of counters for the number of instructions in the loop, an iteration counter, a pointer to the top-of-loop location, and an instruction to initiate the loop.

...read moreread less

6 citations

Proceedings Article•

Heuristic Tuning of Parallel Loop Performance.

[...]

Paul A. Suhler

01 Jan 1989

Journal Article•DOI•

Triple iterative loop technique for optimizing control of large-scale steady-state systems

[...]

J. Lin¹, L. Z. Li¹, B. W. Wan¹, Peter Roberts²•Institutions (2)

Xi'an Jiaotong University¹, City University London²

01 Nov 1989-International Journal of Systems Science

TL;DR: A new approach for hierarchical system optimization and parameter estimation of a large-scale industrial process is described, which significantly reduces the required on-line iterations in order to reach the optimum and has a balanced distribution of model–based computations in the internal double loop.

...read moreread less

Abstract: A new approach for hierarchical system optimization and parameter estimation of a large-scale industrial process is described. This approach can be viewed as a hierarchical implementation of the approximate linear model approach with a three iterative loop structure. The internal double loop iteration where only model-based information is involved is a two-model approach. Augmentation is introduced to enforce and accelerate convergence in the internal double loop. The third iterative loop arises from the hierarchical implementation of the technique and involves coordination of price vectors to ensure balance between separable sub-optimization problems. The major advantages of this triple iterative loop configuration are that it significantly reduces the required on-line iterations in order to reach the optimum, especially when the process is linear or nearly linear and it has a balanced distribution of model–based computations in the internal double loop. Optimality and convergence conditions are examined...

...read moreread less

Proceedings Article•DOI•

A loop optimization technique based on scheduling table

[...]

D. Liu, W. K. Giloi

01 Aug 1989

TL;DR: The concept of reservation table is extended, which is used to develop a pipeline control strategy, and an optimal schedule can be obtained based on the analysis of the extended reservation table, or scheduling table, which makes use of the cyclic regularity of loops.

...read moreread less

Abstract: Loop optimization is an important aspect of microcode compaction to minimize execution time. In this paper a new loop optimization technique for horizontal microprograms is presented, which makes use of the cyclic regularity of loops.We have extended the concept of reservation table, which is used to develop a pipeline control strategy, so that both data dependencies and resource conflicts are taken into account. Based on the analysis of the extended reservation table, or scheduling table, an optimal schedule can be obtained. The iterations of a loop are then rearranged to form a new loop body, whose length may be greater than that of the original one. But the average initiation latency between iterations is minimal.

...read moreread less

Patent•

Echelon method for execution of nested loops in multiple processor computers

[...]

Kevin W. Harris, William B. Noyce

20 Jun 1989

TL;DR: In this paper, a compiler for generating code for enabling multiple processors to process programs in parallel is presented, which enables the multiple processor system to operate in the following manner: one interation of an outer loop in a set of nested loops is assigned to each processor.

...read moreread less

Abstract: A compiler for generating code for enabling multiple processors to process programs in parallel. The code enables the multiple processor system to operate in the following manner: one interation of an outer loop in a set of nested loops is assigned to each processor. If the outer loop contains more iterations than processor in the system, the processors are initially assigned an earlier iteration, and the remaining iterations are assigned to the processor one as they finish their earlier iterations. Each processor runs the inner loop iterations serially. In order to enforce dependencies in the loops, each processor reports its progress in its iterations of the inner loop to the processor executing the succeeding outer loop iteration and the waits until the processor computing the preceding outer loop is ahead or behind in processing its inner loop iteration by an amount which guarantees that dependencies will be enforced.

...read moreread less