scispace - formally typeset
Search or ask a question

Showing papers on "Pipeline (computing) published in 1981"


Journal ArticleDOI
TL;DR: This work describes in detail how to program the cube-connected cycles for efficiently solving a large class of problems that include Fast Fourier transform, sorting, permutations, and derived algorithms.
Abstract: An interconnection pattern of processing elements, the cube-connected cycles (CCC), is introduced which can be used as a general purpose parallel processor. Because its design complies with present technological constraints, the CCC can also be used in the layout of many specialized large scale integrated circuits (VLSI). By combining the principles of parallelism and pipelining, the CCC can emulate the cube-connected machine and the shuffle-exchange network with no significant degradation of performance but with a more compact structure. We describe in detail how to program the CCC for efficiently solving a large class of problems that include Fast Fourier transform, sorting, permutations, and derived algorithms.

1,046 citations


Book ChapterDOI
01 Nov 1981
TL;DR: The MIPS processor is a fast pipelined engine without pipeline interlocks, which attempts to achieve high performance with the use of a simplified instruction set, similar to those found in microengines.
Abstract: MIPS is a new single chip VLSI processor architecture It attempts to achieve high performance with the use of a simplified instruction set, similar to those found in microengines The processor is a fast pipelined engine without pipeline interlocks Software solutions to several traditional hardware problems, such as providing pipeline interlocks, are used

110 citations


Book ChapterDOI
01 Jan 1981
TL;DR: A two-level pipelined systolic array that is capable of performing convolutions of any dimension and the designs take full advantages of the pipelining assumed to be available at each cell are described.
Abstract: Pipelining computations over a large array of cells has been an important feature of systolic arrays. To achieve even higher degrees of concurrency, it is desirable to have cells of a systolic array themselves be pipelined as well. The resulting two-level pipelined systolic array would enjoy in principle a k-fold increase in its throughput, where k is the ratio of the time to perform the entire cell computation over that to perform just one of its pipeline stages. This paper describes such a two-level pipelined systolic array that is capable of performing convolutions of any dimension. The designs take full advantages of the pipelining assumed to be available at each cell.

71 citations


Journal ArticleDOI
TL;DR: A special purpose digital processor which has been implemented using residue arithmetic does two-dimensional pulse matching by convolving a twodimensional five-by-five filter with the Incoming data stream.
Abstract: This paper contains a description of a special purpose digital processor which has been implemented using residue arithmetic. The processor does two-dimensional pulse matching by convolving a twodimensional five-by-five filter with the Incoming data stream. The digital processor is controlled by an Intel 8066 16-bit microprocessor and can store up to 16 distinct pulse patterns with the option of elementary error detection. The design employs modular architecture and a pipeline approach with emitter coupled logic (ECL) Integrated circuits and uses table look-up for calculations in the residue number system. Components have been chosen so that the filter is capable of making thirty million five-by-five filter convolutions per second for two-dimensional pulse matching and signal detection. In actual operation the hardware runs error free at the rate of twenty million operations per second. The circulating input buffer is the limiting factor for higher operation rates.

53 citations


Patent
Hideshi Ishii1
14 Aug 1981
TL;DR: In this article, a pipeline-controlled data processing system includes decoding apparatus for decoding successive instructions and a detection device responsive to the decoding apparatus output for determining when a particular two-instruction sequence is present.
Abstract: A pipeline-controlled data processing system includes decoding apparatus for decoding successive instructions and a detection device responsive to the decoding apparatus output for determining when a particular two-instruction sequence is present. When a first instruction calls for an operation to be executed and the execution result to be loaded into an arithmetic register, and when a second instruction immediately following the first instruction calls for the storing of the output of the arithmetic register into both a main memory unit and a cache memory, the execution result is simultaneously stored in all of the arithmetic register, main memory and cache memory.

42 citations


Book ChapterDOI
TL;DR: The chapter describes the architectures of recently developed SIMD array processors and examines the development experiences of the Burroughs Scientific Processor (BSP) and the Goodyear Aerospace Massively Parallel Processor (MPP).
Abstract: Publisher Summary Vector- or array-processing computers are essentially designed to maximize the concurrent activities inside a computer and to match the bandwidth of data flow to the execution speed of various subsystems within a computer. This chapter reviews architectural advances in vector-processing computers. It describes the two major classes of vector machines—namely, the pipeline computers and array processors. Problems associated with designing pipeline computers are also presented with examples from the Texas Instruments Advanced Scientific Computer (TI-ASC), Control Data STring ARay (STAR-100) and CYBER-205 Computers, Cray Research CRAY-1, and Floating-Point Systems AP-120B. The chapter describes the architectures of recently developed SIMD array processors. Further, it examines the development experiences of the Burroughs Scientific Processor (BSP) and the Goodyear Aerospace Massively Parallel Processor (MPP). Recent research works on array and pipeline processors are also summarized. The chapter concludes with the evaluation of the performance of pipeline and array processors and explores various optimization techniques for vector operations. Hardware, software, and algorithmic issues of vector-processing systems and future trends of vector computers are also discussed.

35 citations


Book ChapterDOI
01 Jan 1981
TL;DR: A technique is developed and used to derive lower bounds on the area required by a VLSI circuit by taking into account the amount of information that has to be memorized in the course of the computation.
Abstract: A technique is developed and used to derive lower bounds on the area required by a VLSI circuit by taking into account the amount of information that has to be memorized in the course of the computation. Simple arguments show, in particular, that any circuit performing operations such as cyclic shift and binary multiplication requires an area at least proportional to its output size. By extending the technique, it is also possible to obtain general tradeoffs between the area, the time, and the period (a measure of the pipeline rate) of a circuit performing operations like binary addition. The existence of VLSI designs for these operations shows that all the lower bounds are optimal up to some constant factor.

33 citations



Patent
Langdon1, G Glen
13 Oct 1981
TL;DR: In this article, an FIFO Rissanen/Langdon arithmetic string code of binary sources is decoded using a pipeline processor and a finite state machine (FSM) in interactive signal relation.
Abstract: An apparatus for ensuring continuous flow through a pipeline processor as it relates to the serial decoding of FIFO Rissanen/Langdon arithmetic string code of binary sources. The pipeline decoder includes a processor (11, 23) and a finite state machine (21, FSM) in interactive signal relation. The processor generates output binary source signals (18), status signals (WASMPS, 31) and K component/K candidate next integer-valued control parameters (L0, k0; L1, k1; 25). These signals and parameters are generated in response to the concurrent application of one bit from successive arithmetic code bits, a K component present integer-value control parameter (52) and K component vector representation (T, TA) of the present internal state (51) of the associated finite state machine (FSM). The FSM makes a K-way selection from K candidate next internal states and K candidate next control parameters. This selection uses no more than K2 +K computations. The selected signals are then applied to the processor in a predetermined displaced time relation to the present signals in the processor. As a consequence, this system takes advantage of the multi-state or "memory" capability of an FSM in order to control the inter-symbol influence and facilitate synchronous multi-stage pipeline decoding.

26 citations


Patent
04 May 1981
TL;DR: In this article, a micro-instruction generator comprises a sequencer for generating instruction addresses, a memory for generating instructions in response to the addresses and a pipeline register adapted to receive the instructions for execution.
Abstract: A digital processor including both macro and micro instruction generators. The micro-instruction generator comprises a sequencer for generating instruction addresses, a memory for generating instructions in response to the addresses and a pipeline register adapted to receive the instructions for execution. The sequencer operates at a constant CLK 1 rate while the pipeline register operates at a variable CLK 2 rate; i.e., the occurrence of a branch instruction in the pipeline register operates to inhibit CLK 2 for one CLK 1 time so as to prevent loading for execution of the aborted sequential instruction during the loading of a new non-sequential instruction address. CLK 2 resumes upon the next CLK 1 signal to resume sequential operation. Special branch instructions are utilized to fetch macro-instructions from a pipelined system of macro-instruction registers. A two-tier synchronous arbitration system for memory requests is also disclosed.

21 citations




Patent
03 Nov 1981
TL;DR: In this paper, a pipeline processor of a radix-2 configuration including an input, intermediate and output processing sections for performing a discrete Fourier transformation (DFT) of an input array of N signal values to derive an output array of at least n signal values representative of the frequency transformation of the input array is disclosed.
Abstract: A pipeline processor of a radix-2 configuration including an input, intermediate and output processing sections for performing a discrete Fourier transformation (DFT) of an input array of N signal values to derive an output array of at least N signal values representative of the frequency transformation of the input array is disclosed. The input and output processing sections include first and second pluralities of cascadedly coupled computational elements respectively, which are governed to perform computations in a pipeline fashion and to propagate resulting interelement computed signal values through the input section in a first predetermined signal flow pattern and through the output section in a second predetermined signal flow pattern to render respectively a first intermediate array and the output array of signal values. All of the multiplication processing is concentrated in the intermediate section which multiplies each of the signal values of the first intermediate array with predetermined transformation values respectively associated therewith to generate a second intermediate array of signal values which is input to the output processing section. Accordingly, the signal values of the input, first and second intermediate and output arrays are input, processed and output through top and bottom rails of the three sections of the pipeline processor sequentially in respectively corresponding predetermined orders of coupled pairs.

Patent
26 Jun 1981
TL;DR: In this paper, an accurate detection of the water level even in an emergency, by installing the reference level device at the inside and outside of the container storing the pressure container, is presented.
Abstract: PURPOSE:To realize an accurate detection of the water level even in an emergency, by installing the reference level device at the inside and outside of the container storing the pressure container, connecting the pipeline of the output side of the reference level device to one input side of the water level detector via the stop valve and then connecting the input side of the reference level device and the other input side pipeline of the water level detector to the upper and lower parts of the pressure container respectively CONSTITUTION:The pipeline 12 of the detection pipes 11 and 12 connected to the upper and lower parts of the pressure container 10 each is connected to one input side of the water level detector 14 The pipeline 11 is branched off halfway, and one 11a of the branched pipes is connected to the reference level device 12a in the container 13 And the other pipe 11b is connected to the reference level device 12b outside the container 13 via the stop valve 15 The pipelines 16a and 16b at the output side of the reference level devices 12a and 12b join to the pipeline 16 via the stop valves 17a and 17b to be then connected to the input side of the detector 14 The temperature detectors 18a and 18b are connected to the pipelines 16a and 16b each, and the pressure detector 19 is connected to the pipeline 16 Each detector is connected to the water level indicator 22 via the reference water level compensating device 21 outside the housing 20 As a result, an accurate detection is possible for the water level even in an emergency



Proceedings ArticleDOI
12 May 1981
TL;DR: A new concept of pseudoparallelism is introduced in which the serial algorithm is partitioned into several noninteractive independent subtasks so that parallelism can be used within each subtask level.
Abstract: Parallel processing has mostly been applied to well-defined and a priori partitioned problems, and not much has been done to introduce parallelism into serial algorithms. This paper introduces a new concept of pseudoparallelism in which the serial algorithm is partitioned into several noninteractive independent subtasks so that parallelism can be used within each subtask level. This novel approach is illustrated by taking motion analysis as an example. Complete details of such a pseudoparallel architecture with a distributed operating system (no master control) have been worked out. Problems encountered in the course of designing such a system are outlined, and necessary justifications provided. A detailed scheme indicating various memory modules, processing elements, and their data-path requirements is included, and ways to provide continuous flow of partitioned information in the form of a synchronized pipeline are described. Finally, the performance of the proposed scheme is evaluated. The basic strategy appears to be useful in designing such parallel systems for other unexplored complex problems.


Patent
02 Dec 1981
TL;DR: In this paper, a pipeline type computer equipped with an operand buffer 20 for storing data until the starting of arithmetic execution by an arithmetic unit 14 has a precedent operating device 29 which performs arithmetic independently of the unit 14 and obtains the conditional code of the arithmetic result on the basis of the result.
Abstract: PURPOSE:To solve a conditional code conflict in its early stage and to improve a processing speed, by reading stored input data immediately, finding the conditional code of the arithmetic result of said data on the basis of the arithmetic result, and emplolying conditional branching instruction processing after a precedent instruction. CONSTITUTION:A pipeline type computer equipped with an operand buffer 20 for storing data until the starting of arithmetic execution by an arithmetic unit 14 has a precedent operating device 29 which performs arithmetic independently of the unit 14 and obtains the conditional code of the arithmetic result on the basis of the result. Then, when decoded information on an instruction to be executed and operand data are stored in the buffer 20 without referece to whether the unit 14 is occupied or not, data is read out of the buffer 20 immediately and inputted to the operating device 29 to perform airthmetic. Then, the conditional code of the result is found on the basis of the arithmetic result and this conditional code is used for the processing of conditional branching instructions succeeding to said instruction.


Patent
12 Jan 1981
TL;DR: An improved magnetic assembly for a magnetic pipeline inspection vehicle in which the assembly is rigidly secured on the body of the vehicle which is ferro-magnetic and provides an annular magnetic return path between magnetic pole members formed of permanent magnets and flexible bristles or foils was presented in this paper.
Abstract: An improved magnetic assembly for a magnetic pipeline inspection vehicle in which the assembly is rigidly secured on the body (3) of the vehicle which is ferro-magnetic and provides an annular magnetic return path between magnetic pole members formed of permanent magnets (2) and flexible bristles or foils (4).

Proceedings ArticleDOI
04 May 1981
TL;DR: It is shown that such a pipeline may be organized from DC groups and thus be amenable to LSI implementation and shown that each pipeline stage Ci may obtain during pipeline computations a temporary result that was computed by any other stage Cj.
Abstract: This paper describes a pipeline system with dynamic architecture that performs cost-effective adaptations to the algorithm being executed. The system performs the following pipeline adaptations: (1) the number of stages in the pipeline changed to allow each instruction to activate the number of stages that matches the number of operations it realizes; (2) the operation sequences in the pipeline modified to allow any sequence of operations to execute without reconfiguration and thus eliminate the time overhead caused by this reconfiguration; and (3) the operation time in each stage adjusted to the minimum required for that operation because it may shorten the time of the total operation. This paper also discusses fast and flexible information exchanges between pipeline stages that can be done while the pipeline is working. Namely, each pipeline stage Ci may obtain during pipeline computations a temporary result that was computed by any other stage Cj. It is shown that such a pipeline may be organized from DC groups and thus be amenable to LSI implementation.

Patent
18 Jul 1981
TL;DR: In this article, the bank units of vector registers are determined without hindrance to practical use and to make it possible to utilize them in parallel by supplying outputs by bank units to a pipeline arithmetic part via different buses.
Abstract: PURPOSE:To determine the bank units of vector registers without hindrance to practical use and to make it possible to utilize them in parallel by supplying outputs by bank units to a pipeline arithmetic part via different buses. CONSTITUTION:Between a main memory device which is not shown in the figure and pipeline arithmetic part 6, vector registers #0VR, #1VR...nVR stored with data loaded from the memory device and data of the arithmetic result of arithmetic part 6 are equipped. Then, those vector registers are equipped in four banks, for example, as shown in the figure so that the same one bank unit will be divided; the ''i''-th data and ''i+1''-th data of one vector register are stored in mutually different bank units and outputs by bank units are supplied to the side of arithmetic part 6 via mutually different buses. Consequently, bank units of vector registers are determined without hindrance to practical use and are used in parallel.

Patent
27 Feb 1981
TL;DR: In this paper, a bypass between the data register and the operation unit is provided to obtain the execution address earlier and to reduce the waiting time, by providing the bypass BPR between the OWR and the effective address register EAR of instruction 2.
Abstract: PURPOSE:To obtain the execution address earlier and to reduce the waiting time, by providing the bypass between the data register and the operation unit. CONSTITUTION:The bypass BPR is provided between the operand data register OWR and the effective address register EAR of instruction 2. To obtain the number of the conventional register GR(N) written in the instruction 1 and the effective address of the instruction 2, the register number of BR and XR is compared at the register number comparison circuit CMP. The comparison of the register number is always made by using the comparison circuit in the pipeline, and bypass is used when the register number is in agreement and the successive operand can be used as the address as it is without giving any operation. Further, in the processing of instruction 2, the address is fed to the effective address register EAR of the memory control S unit, from which the effective address can be obtained.


Patent
29 Oct 1981



Patent
09 Dec 1981
TL;DR: In this article, a pipeline of an even number of stages is proposed to process a memory access request whose clock is different, without increasing the number of hardwares, by constituting a pipeline, and always deciding a degree of priority by a clock at the time of loop-back.
Abstract: PURPOSE:To process a memory access request whose clock is different, without increasing the number of hardwares, by constituting a pipeline of an even number of stages, and always deciding a degree of priority by a clock of an even cycle at the time of loop-back. CONSTITUTION:A titled device is provided with a channel processor use priority circuit 1 for selecting an access request of a 2tau clock from channel processors CHP0-n, a priority circuit 2 for selecting an access request of 1tau clock from CPUs 0-m, and selectors 3, 4 for selecting an output from the circuits 1, 2. Also, each stage 21-27 of a pipeline which has an even number of stages and shifts its contents at every 1tau clock is provided. In this way, at the time of loop- back from a loop-back control part 8, a degree of priority of an even cycle is always decided by a data pool control part 11, etc., and a memory access request whose clock is different is processed without increasing the number of hardware.