scispace - formally typeset
Search or ask a question

Showing papers on "Pipeline (computing) published in 1985"


01 Jan 1985
TL;DR: In this article, the application of a genetic algorithm to the steady state optimization of a serial liquid pipeline is considered, where the algorithm is based upon the mechanics of natural genet...
Abstract: The application of a genetic algorithm to the steady state optimization of a serial liquid pipeline is considered. Genetic algorithms are search procedures based upon the mechanics of natural genet...

269 citations


Proceedings ArticleDOI
01 Jun 1985
TL;DR: Five solutions to the precise interrupt problem in pipelined processors are described and evaluated, with results showing that, at best, the first solution results in a performance degradation of about 16%.
Abstract: An interrupt is precise if the saved process state corresponds with the sequential model of program execution where one instruction completes before the next begins. In a pipelined processor, precise interrupts are difficult to achieve because an instruction may be initiated before its predecessors have been completed. This paper describes and evaluates solutions to the precise interrupt problem in pipelined processors. The precise interrupt problem is first described. Then five solutions are discussed in detail. The first forces instructions to complete and modify the process state in architectural order. The other four allow instructions to complete in any order, but additional hardware is used so that a precise state can be restored when an interrupt occurs. All the methods are discussed in the context of a parallel pipeline struck sure. Simulation results based on the CRAY-1S scalar architecture are used to show that, at best, the first solution results in a performance degradation of about 16%. The remaining four solutions offer similar performance, and three of them result in as little as a 3% performance loss. Several extensions, including virtual memory and linear pipeline structures, are briefly discussed.

266 citations


Journal ArticleDOI
TL;DR: A pipeline structure of a transform decoder similar to a systolic array is developed to decode Reed-Solomon (RS) codes, using a modified Euclidean algorithm for computing the error-locator polynomial.
Abstract: A pipeline structure of a transform decoder similar to a systolic array is developed to decode Reed-Solomon (RS) codes. An important ingredient of this design is a modified Euclidean algorithm for computing the error-locator polynomial. The computation of inverse field elements is completely avoided in this modification of Euclid's algorithm. The new decoder is regular and simple, and naturally suitable for VLSI implementation. An example illustrating both the pipeline and systolic array aspects of this decoder structure is given for a (15,9) RS code.

247 citations


Journal ArticleDOI
01 Dec 1985
TL;DR: The proposed controller architecture can best be described as a macro level pipeline, with parallelism within elements of the pipeline, designed to take maximum benefit of the serial nature of the Newton-Euler equations of motion.
Abstract: A cost-effective architecture for the control of mechanical manipulators based on a functional decomposition of the equations of motion of a manipulator are described. The Lagrange-Euler and the Newton-Euler formulations were considered for this decomposition. The functional decomposition separates the inertial, Coriolis and centrifugal, and gravity terms of the Lagrange-Euler equations of motion. The recursive nature of the Newton-Euler equations of motion lend themselves to being decomposed to the terms used to generate the recursive forward and backward equations. Architectures tuned to the functional flow of the two algorithms were examined. An architecture which meets our design criterion is proposed. The proposed controller architecture can best be described as a macro level pipeline, with parallelism within elements of the pipeline. The pipeline is designed to take maximum benefit of the serial nature of the Newton-Euler equations of motion.

124 citations


Proceedings ArticleDOI
01 May 1985
TL;DR: A pipeline structure of a transform decoder similar to a systolic array is developed to decode Reed-Solomon (RS) codes, and naturally suitable for VLSI implementation.
Abstract: A pipeline structure of a transform decoder similar to a systolic array is developed to decode Reed-Solomon (RS) codes. The error locator polynomial is computed by a modified Euclid's algorithm which avoids computing inverse field elements. The new decoder is regular and simple, and naturally suitable for VLSI implementation.

98 citations


Journal ArticleDOI
TL;DR: The Sensory-Interactive Robotics Group of the National Bureau of Standards' Industrial Systems Division is designing and constructing an experimental multistage pipelined image-processing device for research in machine vision.

79 citations


Patent
25 Feb 1985
TL;DR: In this article, a system for computer pipeline operation in which a plurality of instructions are executed in parallel by commencing before the termination of execution of the preceding instruction, including a conflict detection unit, a data establishment indication unit, and a source data by-pass unit.
Abstract: A system for computer pipeline operation in which a plurality of instructions are executed in parallel by commencing, before the termination of execution of the preceding instruction, the execution of the present instruction, including a conflict detection unit, a data establishment indication unit, and a source data by-pass unit. The source data by-pass unit by-passes a source data to the processing stage which requires this source data immediately after conflict is detected between the result data of the preceding instruction and the source data of the present instruction and the establishment of the source data of the present instruction is detected.

59 citations


Journal ArticleDOI
Somani1, Agarwal1
TL;DR: Unlike some recent designs, this machine does not use any links other than the binary tree links, provides optimal performance without the need to store data elements in any sorted order by exploiting dynamic rebalancing, has higher throughput, and keeps the logical last level of the tree on one physical level ofThe tree.
Abstract: A systolic binary tree machine which can handle all the dictionary machine and priority queue operations such as Insert, Delete, Extract-Min, Extract-Max, Member, and Near is designed in this paper. The operations can be fed into the tree machine in a pipeline manner at a constant rate and the output is correspondingly generated in a pipeline manner. Each processor in the machine stores at most one data element, which consists of a key value and a record associated with the key. The machine has optimal performance since if the number of data elements present in the tree is n, then each operation takes O(log n) steps. Unlike some recent designs, this machine does not use any links other than the binary tree links, provides optimal performance without the need to store data elements in any sorted order by exploiting dynamic rebalancing, has higher throughput, and keeps the logical last level of the tree on one physical level of the tree.

54 citations


Patent
17 May 1985
TL;DR: In this paper, an automatic insertion device is used to insert or withdraw a piston rod into a pressurized fluid pipeline, which can then be equipped with a turbine meter, a temperature sensor, or doppler measuring equipment.
Abstract: The automatic insertion device will insert or withdraw a piston rod into a pressurized fluid pipeline. In one embodiment, the piston rod will allow intrumentation on the exterior of the pipeline to directly sense the pressure of the fluid inside of the pipeline. In an alternate embodiment, the piston rod may be used to remove liquids from the pressurized pipeline. In the preferred embodiment, a cap is put on the end of the piston rod to isolate it from the pressure inside of the pipeline. The automatic insertion device can then be equipped with a turbine meter, a temperature sensor, or doppler measuring equipment. In another embodiment, a pitot probe can be placed on the end of the pipeline for measurement of differential pressure which, with additional instrumentation, can be used to measure flow through the pipeline. In another embodiment, the automatic insertion device can be combined with a pump to remove samples of the fluid within the pipeline.

51 citations


01 Jan 1985
TL;DR: In this paper, the authors present computational results concerning the solution of knapsack, shortest paths and change-making problems by branch and bound, dynamic programming, and divide and conquer algorithms on the ICL-DAP (an SIMD computer), the Manchester dataflow machine and the CDC-CYBER-205 (a pipeline computer).
Abstract: In the last decade many models for parallel computation have been proposed and many parallel algorithms have been developed. However, few of these models have been realized and most of these algorithms are supposed to run on idealized, unrealistic parallel machines. The parallel machines constructed so far all use a simple model of parallel computation. Therefore, not every existing parallel machine is equally well suited for each type of algorithm. The adaptation of a certain algorithm to a specific parallel archi- tecture may severely increase the complexity of the algorithm or severely obscure its essence. Little is known about the performance of some standard combinatorial algorithms on existing parallel machines. In this paper we present computational results concerning the solution of knapsack, shortest paths and change-making problems by branch and bound, dynamic programming, and divide and conquer algorithms on the ICL-DAP (an SIMD computer), the Manchester dataflow machine and the CDC-CYBER-205 (a pipeline computer). 1980 Mathematics Subject Classification: 90C27, 68Q10, 68R05. This paper appeared in European Journal of Operational Research, Vol. 33, pp 65-81, 1988.

42 citations


Patent
24 Jun 1985
TL;DR: In this paper, a graphics display apparatus employs a general purpose or main microprocessor providing general control of the apparatus including receiving high-level graphic orders defining a desired graphic image from a host processor and dedicated graphics microprocessor connected to receive low-level graphics orders from the general microprocessor along a pipeline constituted by a shared buffer store.
Abstract: A graphics display apparatus employs a general purpose or main microprocessor providing general control of the apparatus including receiving high-level graphic orders defining a desired graphic image from a host processor and dedicated graphics microprocessor connected to receive low-level graphic orders from the general microprocessor along a pipeline constituted by a shared buffer store. Pipeline control logic controls the pipeline by blocking the graphics processor which generally operates more quickly than the general processor until the latter has completed computation of all the low-level orders associated with a particular high-level order. The front-of-screen performance can be further improved by backing up the pipeline to repeat certain low-level orders rather than by obtaining these repeated orders by recomputation. Graphics hardware controlled by the graphics processor loads appropriate bit patterns into an all points addressable refresh buffer for subsequent display on a cathode ray tube monitor.

Proceedings ArticleDOI
01 Mar 1985
TL;DR: The results indicate that the sampling rate in either case may be significantly increased by adding processors to a pipelined array while, on the other hand, the compute time delay decreases very little.
Abstract: Algorithms have been developed for the Jacobian and Inverse Dynamics analyses in order to implement them on pipeline/parallel computing arrays. The results indicate that the sampling rate in either case may be significantly increased by adding processors to a pipelined array while, on the other hand, the compute time delay decreases very little. The results further show that a parallel structure is needed if the compute time is to be significantly reduced.

Journal ArticleDOI
TL;DR: Two new vector-reduction techniques are proposed that will greatly simplify the machine-level programming effort needed to implement vector- reduction operations and improve the performance of vector-arithmetic pipelines in scientific supercomputers.
Abstract: Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs. This class of vector operation forms the basis of many scientific computations, such as inner product and finding the maximum among the vector components. Vector reduction on a pipeline processor demands a feedback connection around the pipeline. Since the output of such a pipeline depends on the previous output, improper control of the feedback input may destroy the benefit from pipelining. Two new vector-reduction techniques are proposed in this paper. In addition to saving reduction time and eliminating intermediate storage (as compared to Kuck's method and Kogge's method), the new methods will greatly simplify the machine-level programming effort needed to implement vector-reduction operations. An interleaved technique is introduced to reduce multiple vectors to corresponding scalars using the same arithmetic pipeline. The pipeline can be fully utilized by interleaving multiple vector-reduction processes. The proposed techniques can be applied to improve the performance of vector-arithmetic pipelines in scientific supercomputers.

Journal ArticleDOI
TL;DR: In this article, a novel pipeline A/D convertor configuration is proposed which appears to have some speed and accuracy advantages over earlier schemes, and circuits are also given for the compensation of the DC offset voltages of the input S/H stages, and for increasing the time available for signal acquisition.
Abstract: A novel pipeline A/D convertor configuration is proposed which appears to have some speed and accuracy advantages over earlier schemes. Circuits are also given for the compensation of the DC offset voltages of the input S/H stages, and for increasing the time available for signal acquisition.

Patent
Toshiaki Kitamura1, Oinaga Yuji1
15 Aug 1985
TL;DR: An error recovery system in a data processor of the pipeline type, including control storage for storing instruction data, having an error correction and detection code adapted to the detection and correction of errors, for controlling the data processor is described in this paper.
Abstract: An error recovery system in a data processor of the pipeline type, including control storage for storing instruction data, having an error correction and detection code adapted to the detection and correction of errors, for controlling the data processor. A parity check circuit checks instructions read from the control storage and stops at least a part of pipeline processing immediately upon the detection of an error. An error correction circuit corrects the error in the read instruction data and rewrites the instruction data into the control storage while the part of the pipeline processing is stopped.

Book
01 Jan 1985

Proceedings ArticleDOI
01 Jun 1985
TL;DR: Two efficient and powerful algorithms which synthesize near optimal clocking schemes have been programmed and these algorithms are applied to synthesis and/or performance evaluation of a design in progress.
Abstract: Clocking scheme synthesis includes the partitioning of functions into time steps, the number of clock phases, the length of each phase, (ie how to pipeline) and the assignment of functions to clock phases; each of these choices affects performance Some important problems of clocking scheme synthesis are examined Two efficient and powerful algorithms which synthesize near optimal clocking schemes have been programmed These algorithms are applied to synthesis and/or performance evaluation of a design in progress Optimizing the speed of a previously designed system is also considered

Journal ArticleDOI
TL;DR: It is possible to use recursive algorithms for time delay and parameter estimation of linear discrete-time single-input single-output dynamic systems from input-output data and a recursive method to define the efficiency of the identification scheme is presented.

Patent
Howard Thomas Olnowich1
04 Oct 1985
TL;DR: In this article, a pipelined instruction execution system including a microstore for storing sequences of microinstruction addresses associated with each macroinstruction, a nanostore for randomly storing unique microinstructions, and an execution unit for executing the micro-branch instructions is presented.
Abstract: In a pipelined instruction execution system including a microstore for storing sequences of microinstruction addresses associated with each macroinstruction, a nanostore for randomly storing unique microinstructions, and an execution unit for executing the microinstructions, a no-op/prefetch apparatus, according to the present invention, prevents a microinstruction address, stored in the microstore, from accessing the nanostore and forces a no-op address into the nanostore when the execution unit executes a conditional microbranch instruction. A no-op microinstruction, corresponding to the no-op address, is retrieved from the nanostore and is executed in the execution unit. During the execution of the no-op microinstruction in the execution unit, the no-op/prefetch apparatus permits either the next sequential microinstruction address following the conditional microbranch instruction to access the nanostore or another non-sequential microinstruction address to access the nanostore, the selection of the next sequential microinstruction address or said another non-sequential microinstruction depending upon the outcome of the execution of the conditional microbranch instruction by the execution unit. As a result, when the microstore and the nanostore are utilized, only one cycle of delay, for resolution of the pipeline, will be encountered following the execution of the conditional branch microinstruction by the execution unit. Furthermore, additional real estate is available on the integrated circuit chip on which the instruction execution system is disposed.

Patent
11 Apr 1985
TL;DR: In this article, a digital time base corrector is provided, in which a digital input signal of one block consisting of a continuous data time sequence is converted to a digital signal including data lack intervals or vice versa by a variable delay circuit.
Abstract: There is provided a digital time base corrector in which a digital input signal of one block consisting of a continuous data time sequence is converted to a digital signal including data lack intervals or vice versa by a variable delay circuit. A signal selecting circuit is divided into N first unit selecting circuits and a second unit selecting circuit. M of the output signals of a shift register are inputted to the first unit selecting circuits, by which one of them is selected. The outputs of the N first unit selecting circuits are supplied to the second unit selecting circuit, by which one of them is selected. A pipeline process is performed by inserting a delay circuit to delay the signal for the time of one clock period into the input/output line of the second unit selecting circuit. Further, the selecting signal can be made variable for every one clock and a delay circuit is inserted on the output side of a selecting signal forming circuit. With this corrector, the influence of the gate delay of the selectors can be reduced and the high speed data process can be performed.

Journal ArticleDOI
TL;DR: A digital signal processor (DSP) is described which achieves high processing efficiency by executing concurrently four functions in every processor cycle: instruction prefetching from a dedicated instruction memory and generation of an effective operand, access to a single-port data memory and transfer of a data word over a common data bus, arithmetic/logic-unit (ALU) operation, and multiplication.
Abstract: A digital signal processor (DSP) is described which achieves high processing efficiency by executing concurrently four functions in every processor cycle: instruction prefetching from a dedicated instruction memory and generation of an effective operand, access to a single-port data memory and transfer of a data word over a common data bus, arithmetic/logic-unit (ALU) operation, and multiplication. Instructions have a single format and contain an operand, index control bits, and two independent operation codes called “transfer” code and “compute” code. The first code specifies the transfer of a data word over the common data bus, e.g., from data memory to a local register. The second determines an operation of the ALU on the contents of local registers. A fast free-running multiplier operates in parallel with the ALU and delivers a product in every cycle with a pipeline delay of two cycles. The architecture allows transversal-filter operations to be performed with one multiplication and ALU operation in every cycle. This is accomplished by a novel interleaving technique called ZIP-ing. The efficiency of the processor is demonstrated by programming examples.

Journal ArticleDOI
T. Temma1, M. Iwashita, K. Matsumoto, H. Kurokawa, T. Nukiyama 
TL;DR: An Image Pipelined Processor with high-speed processing capability has been developed utilizing a data-flow architecture strongly emphasizing pipeline processing technique, which contributes to easier chip development by using CAD systems.
Abstract: An Image Pipelined Processor (ImPP) with high-speed processing capability has been developed utilizing a data-flow architecture strongly emphasizing pipeline processing technique. Moreover, a ring-shaped ImPP array forms a multiprocessor system by itself for higher processing performance. Pipeline processing is especially suitable for operations on streams (sequences of data). Most image processing includes these operations. The ImPP is composed of ten pipeline modules. These are connected with each other in pipeline manner, that is, a one-directional bus is mainly used between two modules for transmitting a token composed of data value and several control signals. All modules can be individually designed in parallel only with the token information. This hierarchical design approach contributes to easier chip development by using CAD systems. The ImPP is a 6.96 mm × 6.99 mm single die, containing over 115 000 transistors with advanced 1.75-µm NMOS technology, as one data-flow processor chip. It operates at 10-MHz clock rate and takes 200 ns for each pipeline stage. To make the application program development convenient, an ImPP assembler has also been developed. Using this assembler, several image processing programs have been prepared and simulated. The ImPP architecture and its processing capability are discussed based on this simulation result.

Patent
Yoshihiro Mizushima1
26 Sep 1985
TL;DR: In this paper, a pipeline control system for a computer in which predetermined tag data or micro instructions are stored in a plurality of tag registers while executing a first sequence of instructions in order to repetitively execute a flow of processing which is based upon the same.
Abstract: A pipeline control system for a computer in which predetermined tag data or micro instructions are stored in a plurality of tag registers while executing a first sequence of instructions in order to repetitively execute a flow of processing which is based upon the same. A required tag is selected from the tags stored in the tag registers in steps for executing second and subsequent sequences of instructions in which the same instructions can be repeated, so that the execution is initiated from a second phase without executing a first phase.

Journal ArticleDOI
TL;DR: It is clarified that this architecture makes it possible to improve the performance 1.8 times faster than without it, and the additional hardware increase for this architecture is 33%.
Abstract: This paper considers a high-performance architecture for Variable Length Instructions (VLI) and its evaluation. The VLI studied here are characterized by a multioperand which can treat several independent operands, and orthogonality which can specify the operand address independently of the operation code. This architecture consists of the pipeline processing method suitable for multioperand instruction format, instruction/data separate-type dual cache memory configuration to obtain memory throughput necessary for efficient operation of pipeline, and the operand location-free microprogramming method with virtualized operand interface. The basic ideas of the pipeline processing method are the operand specifier (OSP) based pipeline processing which is realized by Inter-OSP based pipeline processing and two simultaneously processing OSPs. From the mixed value analysis and the simulation result, it is clarified that this architecture makes it possible to improve the performance 1.8 times faster than without it. The additional hardware increase for this architecture is 33%.

Journal ArticleDOI
TL;DR: A theoretical study is presented of the synthesis of the most efficient pipelined processing system for executing a digital signal processing (DSP) algorithm and the results are used to demonstrate how the overall problem can be reformulated as an optimization problem to minimize a cost function of the system under certain constraints.
Abstract: A theoretical study is presented of the synthesis of the most efficient pipelined processing system for executing a digital signal processing (DSP) algorithm. The results provide the conditions for determining the most efficient mapping of DSP algorithms to the processing elements of a processor. This is achieved by allocating the various arithmetic and data transfer operations of the algorithm to the processing elements of the system in such a manner that all basic components are concurrently active throughout the entire operation. Pipelining and parallel processing at each stage of the pipeline is used to achieve the best possible performance. Applications of the results to the design of FFT-processors are also included. Finally, the results are used to demonstrate how the overall problem can be reformulated as an optimization problem to minimize a cost function of the system under certain constraints.

Proceedings ArticleDOI
26 Apr 1985
TL;DR: This paper presents a special purpose processing architecture which supports real-time search of large codebooks, and describes a pipeline architecture which reduces the effective computation time for a single vector component to one period.
Abstract: Vector Quantization is an attractive block coding scheme which allows more efficient use of a given channel capacity The computational complexity of the codebook search, however, limits the practical applications of Vector Quantization -- especially in real-time situations This paper presents a special purpose processing architecture which supports real-time search of large codebooks The core of the computation required for a squared-error distortion measure is accomplished using a VLSI pattern matching chip We describe a pipeline architecture which reduces the effective computation time for a single vector component to one period The chip supports a selectable vector dimension of 2, 4, 8, or 8 and can be used with any codebook size consistent with the input vector rate and the chip's throughput We have built and tested a 4-micron NMOS chip which supports up to four million squared-error distance calculations per second We outline three approaches to real-time Vector quantization which use this chip to do the pattern matching: two waveform coding processors using Vector Pulse Code Modulation and Adaptive Vector Predictive Coding, and a novel approach to rapid codebook design

Journal ArticleDOI
TL;DR: In this paper, the authors evaluate the effectiveness of replacing the old pipe by comparing the risk between two gas pipeline systems; one wherein replacement has been effected and the other wherein the existing pipe is maintained.

Patent
19 Dec 1985
TL;DR: In this article, the authors describe a real-time image processing circuit based on the Fourier logic, where a number of individual processors are connected in parallel in a regular logic and are connected via a pipeline.
Abstract: The circuit according to the invention is characterised by the fact that, for the purpose of image processing in the Fourier space, a number of individual processors of simple structure are connected in parallel in a regular logic and are connected via a pipeline. The circuit according to the invention thus has the advantage that it can also process high-resolution images in real time.


Proceedings Article
01 Jan 1985