scispace - formally typeset
Search or ask a question

Showing papers on "Pipeline (computing) published in 1984"


Journal ArticleDOI
TL;DR: VLSI implementations have constraints which differ from those of discrete implementations, requiring another look at some of the typical FFT'algorithms in the light of these constraints.
Abstract: In some signal processing applications, it is desirable to build very high performance fast Fourier transform (FFT) processors. To meet the performance requirements, these processors are typically highly pipelined. Until the advent of VLSI, it was not possible to build a single chip which could be used to construct pipeline FFT processors of a reasonable size. However, VLSI implementations have constraints which differ from those of discrete implementations, requiring another look at some of the typical FFT'algorithms in the light of these constraints.

327 citations


Journal ArticleDOI
TL;DR: This paper examines common implementations of linear algebra algorithms, such as matrix-vector multiplication, matrix-matrix multiplication and the solution of linear equations for efficiency on a computer architecture which uses vector processing and has pipelined instruction execution.
Abstract: This paper examines common implementations of linear algebra algorithms, such as matrix-vector multiplication, matrix-matrix multiplication and the solution of linear equations. The different versions are examined for efficiency on a computer architecture which uses vector processing and has pipelined instruction execution. By using the advanced architectural features of such machines, one can usually achieve maximum performance, and tremendous improvements in terms of execution speed can be seen over conventional computers.

249 citations


Journal ArticleDOI
Hennessy1
TL;DR: In a VLSI implementation of an architecture, many problems can arise from the base technology and its limitations, so the architects must be aware of these limitations and understand their implications at the instruction set level.
Abstract: A processor architecture attempts to compromise between the needs of programs hosted on the architecture and the performance attainable in implementing the architecture. The needs of programs are most accurately reflected by the dynamic use of the instruction set as the target for a high level language compiler. In VLSI, the issue of implementation of an instruction set architecture is significant in determining the features of the architecture. Recent processor architectures have focused on two major trends: large microcoded instruction sets and simplified, or reduced, instruction sets. The attractiveness of these two approaches is affected by the choice of a single-chip implementation. The two different styles require different tradeoffs to attain an implementation in silicon with a reasonable area. The two styles consume the chip area for different purposes, thus achieving performance by different strategies. In a VLSI implementation of an architecture, many problems can arise from the base technology and its limitations. Although circuit design techniques can help alleviate many of these problems, the architects must be aware of these limitations and understand their implications at the instruction set level.

216 citations


Journal ArticleDOI
TL;DR: Use of the cut theory and ring architectures for arrays with feedback gives effective fault-tolerant and two-level pipelining schemes for most systolic arrays.

170 citations


Journal ArticleDOI
TL;DR: A semicustom delay commutator circuit to support the implementation of high-speed fast Fourier transform processors based on the radix 4 pipeline FFT algorithm of J.H. McClellan and R.J. Purdy (1978) is described.
Abstract: The development is described of a semicustom delay commutator circuit to support the implementation of high-speed fast Fourier transform processors based on the radix 4 pipeline FFT algorithm of J.H. McClellan and R.J. Purdy (1978). The delay commutator is a 108000-transistor circuit comprising 12288 shift register stages and approximately 2000 gates of random logic realized with 2.5-micrometer design rule CMOS standard cell technology. It operates at a 10-MHz clock rate, which processes data at a 40-MHz rate. The delay commutator is suitable for implementing processors that compute transforms of 16, 64, 256, 1024, and 4096 (complex) points. It is implemented as a 4-bit-wide data slice to facilitate cocatenation to accommodate common data word sizes and to use a standard 48-pin dual-in-line package.

141 citations


Proceedings ArticleDOI
19 Mar 1984
TL;DR: The paper presents a revised functional description of Volder's Coordinate Rotation Digital Computer algorithm (CORDIC), as well as allied VLSI implementable processor architectures, and benefits the execution speed in array configurations, since it will allow pipelining at the bit level.
Abstract: The paper presents a revised functional description of Volder's Coordinate Rotation Digital Computer algorithm (CORDIC), as well as allied VLSI implementable processor architectures. Both pipelined and sequential structures are considered. In the general purpose or multi-function case, pipeline length (number of cycles), function evaluation time and accuracy are all independent of the various executable functions. High regularity and minimality of data-paths, simplicity of control circuits and enhancement of function evaluation speed are ensured, partly by mapping a unified set of micro-operations, and partly by invoking a natural encoding of the angle parameters. The approach benefits the execution speed in array configurations, since it will allow pipelining at the bit level, thereby providing fast VLSI implementations of certain algorithms exhibiting substantial structural pipelining or parallelism.

124 citations


Patent
10 Feb 1984
TL;DR: In this article, a program instruction flow prediction apparatus and method employ a high speed flow prediction storage element for predicting redirection of program flow prior to the time when the instruction has been decoded, and further provide circuitry for updating the storage element, correcting erroneous branch and/or non-branch predictions, and accommodating instructions occurring on even or odd boundaries of the normally read double word instruction.
Abstract: A data processing system for processing a sequence of program instructions has a pipeline structure which includes an instruction pipeline and an execution pipeline. Each of the instruction and execution pipelines has a plurality of serially operating stages. The instruction pipeline reads instructions from storage and forms therefrom address data to be employed by the execution pipeline. The execution pipeline receives the address data and uses it for referencing stored data to be employed for execution of the program instructions. A program instruction flow prediction apparatus and method employ a high speed flow prediction storage element for predicting redirection of program flow prior to the time when the instruction has been decoded. Circuitry is further provided for updating the storage element, correcting erroneous branch and/or non-branch predictions, and accommodating instructions occurring on even or odd boundaries of the normally read double word instruction. Circuitry is further provided for updating the program flow in a single execution cycle so that no disruption to normal instruction sequencing occurs.

87 citations


Patent
17 Jan 1984
TL;DR: In this paper, a digital data processing system including a number of input/output units (12) that communicate with a memory (11) over an input-output bus (30) and through an input/Output interface (31) is described.
Abstract: A digital data processing system including a number of input/output units (12) that communicate with a memory (11) over an input/output bus (30) and through an input/output interface (31). The input/output interface (31) pipelines transfer between the input/output units (12) and the memory (11). In the event of an error in the input/output interface's pipeline buffer, it transmits information to the input/output (12) that initiated the transfer unit to enable it to re-initiate the transfer.

58 citations


Patent
05 Nov 1984
TL;DR: In this article, an electronic data processing pipeline system and method for processing encoded control points representing graphical illustrations is described. But this system is not suitable for the processing of 3D line drawings.
Abstract: OF THE DISCLOSURE An electronic data processing pipeline system and method for processing encoded control points representing graphical illustrations. The pipeline comprises a number of separate micro-programmed circuit cards, each of which are programmed to perform a specific processing operation. A command is sent to a matrix maker card which calculates a transformation matrix representing the desired transformation. Electronic representations of control data points are then trasmitted to the pipeline for processing. These control points are 3D points comprising both the vertices which terminate linear edges of the illustration and the control points corresponding to curved edges of the illustration. The control points are then multiplied by the transformation matrix in a vector multiplier circuit card. They are then clipped to the planes of the viewing frustum and are mapped onto the 2D viewing window. The clipped control points are then exploded to generate a plurality of small line segments representing each of the curved edges of the illustration. The appropriate portions of the illustration are rendered as a line drawing and the processed data is converted into a form which is appropriate for scan conversion.

43 citations


Journal ArticleDOI
01 Jan 1984
TL;DR: The architecture and programming environment of the HEP is introduced and a range of scientific applications programs for which parallel versions have been produced, tested, and analyzed on this computer are surveyed.
Abstract: Pipelining has been used to implement efficient, high-speed vector computers. It is also an effective method for implementing multiprocessors. The Heterogeneous Element Processor (HEP) built by Denelcor Incorporated is the first commercially available computer system to use pipelining to implement multiple processes. This paper introduces the architecture and programming environment of the HEP and surveys a range of scientific applications programs for which parallel versions have been produced, tested, and analyzed on this computer. In all cases, the ideal of one instruction completion every pipeline step time is closely approached. Speed limitations in the parallel programs are more often a result of the extra code necessary to ensure synchronization than of actual synchronization lockout at execution time. The pipelined multiple instruction stream architecture is shown to cover a wide range of applications with good utilization of the parallel hardware.

43 citations


Patent
Oinaga Yuji1
26 Sep 1984
TL;DR: A vector data processing system comprising at least an A-access pipeline (27) and a B access pipeline (28) between a main storage unit (4) and vector registers is described in this paper.
Abstract: A vector data processing system comprising at least an A-access pipeline (27) and a B-access pipeline (28) between a main storage unit (4) and vector registers (21). Associated with the A-access pipeline (27) are a write port (WA) and a read port (RA) selectively connected to the vector registers (21). Associated with the B-access pipeline (28) are a write port (WB) and a read port (RB) selectively connected to the vector registers (21). An additional read port (IA) is linked between the read port (RB) of the B-access pipeline (28) and the address input side of the A-access pipeline (27). When an indirect address load/store instruction is carried out for the A-access pipeline (27), an indirect address is generated from the vector registers (21) via the read port (RB) of the B-access pipeline (28) and the additional read port (IA). Since the additional read port (IA), for generating an indirect address for the A-access pipeline (27), is connected to the read port (RB) of the B-access pipeline (28), it is unnecessary to increase the multiplexity of interleaves of the vector registers (21).

Patent
Yaoko Yoshida1
21 Dec 1984
TL;DR: A vector processing apparatus has a number of pipeline arithmetic units operating concurrently to execute a set of vector instructions dealing with vector elements as discussed by the authors, and stack registers are provided for each arithmetic unit to hold the vector instruction address, leading vector element position and vector register internal address, so that one of the exceptions that can be detected successively by several arithmetic units during the process of the vector instructions is selected on a priority basis through the comparison of information in the stack of the currently detected exception with information of exception detected previously.
Abstract: A vector processing apparatus has a number of pipeline arithmetic units operating concurrently to execute a set of vector instructions dealing with vector elements. Stack registers are provided for each arithmetic unit to hold the vector instruction address, leading vector element position and vector register internal address, so that one of the exceptions that can be detected successively by several arithmetic units during the process of the vector instructions is selected on a priority basis through the comparison of information in the stack of the currently detected exception with information of exception detected previously.

Patent
02 Mar 1984
TL;DR: In this paper, a pipeline pig is tracked by detecting energy emission resulting from impact of the moving pig with at least two previously identified features which are located within the pipeline at known spaced intervals.
Abstract: OF THE DISCLOSURE This invention relates to the tracking of a pipeline pig during its movement through a pipeline carrying gas, for instance. The pig is tracked by detecting energy emission resulting from impact of the moving pig with at least two previously identified features which are located within the pipeline at known spaced intervals. The energy emission takes the form of vibrational signals which are sensed by a geophone(1)which is coupled externally to the wall or associated equipment of a gas pipeline (2). An electrical output proportional to the vibration is amplified by a preamplifier (5) and is then filtered by a unit(6)to remove unwanted frequency components of the signal. The filtered signal is fed to a chart recorder (7). The unfiltered preamplifled signal is fed to an audioamplifier(9) where the signal is amplified to audible levels.

Journal ArticleDOI
01 Jul 1984
TL;DR: In this article, a model for simulating the dynamic behavior of gas distribution pipeline networks is developed, and a new implicit integration method and a special crossing-branch method are presented.
Abstract: A model for simulating the dynamic behavior of gas distribution pipeline networks is developed. A new implicit integration method and a special crossing-branch method are presented. The first method can accommodate systems exhibiting both slow and fast responses and prevents numerical oscillations in the solution. In the crossing-branch method, the solution of very large systems described by sparse matrices is provided, based upon special properties of the matrix. On the basis of these methods, a universal model for simulating the network of any structure of arbitrary size is created. Examples of model testing and use are given.

Patent
Tone Hirosada1
07 Dec 1984
TL;DR: A buffer storage system for a pipeline processor, set up with at least an operand access buffer storage and an instruction fetch buffer storage, is described in this article, where a feedback means is mounted between the buffer registers and the store address/data registers.
Abstract: A buffer storage system for a pipeline processor, set up with at least an operand access buffer storage and an instruction fetch buffer storage. The buffer storages cooperate with store address registers and store data registers, to assume a store-through method between the buffer storages and a main storage. A feedback means is mounted between the buffer registers and the store address/data registers. This feedback means is activated during an operand store operation to apply an operand store address and an operand store data, from the store address and store data registers, to the instruction fetch buffer register for effecting coincidence in data among the storages.

Proceedings ArticleDOI
13 Mar 1984
TL;DR: Pipeline techniques are used to assign the computation modules required in Inverse Plant plus Jacobian Control of robot manipulators to a multiple set of processors to evaluate the performance.
Abstract: Pipeline techniques are used to assign the computation modules required in Inverse Plant plus Jacobian Control of robot manipulators to a multiple set of processors. With example execution times for each of the modules, the compute time, initiation rate, and CPU utilization are used to evaluate the performance. For three processors, with more than 90% CPU utilization, the compute time is reduced by more than 25% and the initiation rate is almost tripled over that of a single processor.

Journal ArticleDOI
TL;DR: A bit-sequential processing element with O(n) complexity that uses semi-on-line algorithms produces a result clock cycles after the absorption of the n-bit operands, where the result is small compared to n.
Abstract: A bit-sequential processing element with O(n) complexity is described, where n is the wordlength of the operands. The operations performed by the element are A * B + C * D, A/B, and ?A. The operands are fixed point or floating point numbers with variable precision. The concept of semi-on-line algorithms is introduced. A processing element that uses semi-on-line algorithms produces a result ? clock cycles after the absorption of the n-bit operands, where ? is small compared to n. In the paper the processing element and the algorithms are described. A performance comparison between the bit-sequential processing element and conventional pipelined arithmetic units is given.

Patent
04 Sep 1984
TL;DR: In this article, a final control element (24), the actuator, the sensor, and the control device are combined into an assembly which can be inserted as such in the pipeline.
Abstract: The regulation and/or control arrangement for regulating or controlling the throughput of gas or liquid streams in pipelines comprises a final control element (24) which can be inserted in a pipeline, an actuator for controlling the final control element (24), a sensor (16) which can be inserted in the pipeline and which supplies a regulating or control variable dependent on the throughput of a gas or liquid stream in the pipeline, and a control device to which the regulating or control variable output by the sensor (16) is applied as an input signal and which generates a trigger signal for the actuator. The final control element (24), the actuator, the sensor (16) and the control device are combined into an assembly which can be inserted as such in the pipeline.

Journal ArticleDOI
TL;DR: Optical systolic pipeline processors for polynomial evaluation can be built using Horner’s rule and with integrated optics techniques it will be possible to fabricate large-order pipelines operating at very high speeds.
Abstract: Optical systolic pipeline processors for polynomial evaluation can be built using Horner’s rule. With integrated optics techniques it will be possible to fabricate large-order pipelines operating at very high speeds.

Patent
20 Jul 1984
TL;DR: In this paper, a spiral gas stream is formed along the inside wall of a pipeline by the spiral motion of gas stream, and the layer prevents the direct contact of the solid particles to the internal wall of the pipeline.
Abstract: Uniform flow of gas flowing in a cylinder having inner diameter larger than that of a pipeline is introduced through a funnelform reducer into the inlet of the pipeline, where uniform flow of gas turned to a spiral gas stream by bringing to the mean gas stream velocity faster than 20 meter per second there. When solid particles are introduced into the spiral gas strem zone, they are transported to the outlet of the pipeline. A compressed gas layer is formed along the inside wall of the pipeline by the spiral motion of gas stream, and the layer prevents the direct contact of the solid particles to the inside wall of the pipeline which causes the erosion of the pipeline. As the center part of the cross section of the pipeline becomes very low pressure, especially along the axis of the pipeline, solid particles containing or accompanying volatile matters are desiccated or concentrated as a result of the evaporation of volatile matters while being transported in the pipeline. When grinding powder (abrasive) is transported by the spiral gas stream and spattered on an object, the surface of the object is ground without consumption of much energy.

Patent
Yaoko Yoshida1
27 Dec 1984
TL;DR: A vector processing apparatus has a number of pipeline arithmetic units (30-33) operating concurrently to execute a set of vector instructions dealing with vector elements as mentioned in this paper, and stack registers (301-309) are provided for each arithmetic unit to hold the vector instruction address, leading vector element position and vector register internal address, so that one of exceptions that can be detected successively by several arithmetic units during the process of the vector instructions is selected on a priority basis.
Abstract: A vector processing apparatus has a number of pipeline arithmetic units (30-33) operating concurrently to execute a set of vector instructions dealing with vector elements. Stack registers (301-309) are provided for each arithmetic unit to hold the vector instruction address, leading vector element position and vector register internal address, so that one of exceptions that can be detected successively by several arithmetic units during the process of the vector instructions is selected on a priority basis through the comparison of information in the stack of the currently detected exception with information of exception detected previously.

Journal ArticleDOI
01 Jan 1984
TL;DR: By representing high level language (HLL) control statements with special machine language instructions, the usual delays associated with control flow changes can be reduced and preserving the HLL control flow information increases performance by reducing both the number of executed branches and pipeline breaks.
Abstract: This paper presents a technique for specifying change of control (eg branch) commands at a sequential processor's macroinstruction set level It is shown that by representing high level language (HLL) control statements with special machine language instructions, the usual delays associated with control flow changes can be reduced Preserving the HLL control flow information increases performance by reducing both the number of executed branches and pipeline breaks

Patent
12 Jul 1984
TL;DR: A pipeline conditioning and sampling system for nonhomogenous liquids in a pipeline, for example a crude oil pipeline, which collects a sample of fluid that is representative of a batch of fluid passing through the pipeline, is described in this paper.
Abstract: A pipeline conditioning and sampling system for non-homogenous liquids in a pipeline, for example a crude oil pipeline, which collects a sample of fluid that is representative of a batch of fluid passing through the pipeline. The sampling system includes a jet mixing, pipeline contents conditioning system to thoroughly agitate the contents of the pipeline to produce a uniform mixture from which a sample is collected. The sample may be withdrawn, using a sampler probe, directly from the pipeline, downstream of the jet mixer. Alternatively, as shown, the sample may be collected, using a flow-through-cell sampler, from the return loop, through which a portion of the uniformly mixed fluid in the pipeline is pumped to the jet mixing nozzle(s), or from a bypass loop connected to the return loop.

Journal ArticleDOI
TL;DR: This paper examines how architecture, the definition of the instruction set and other facilities that are available to the user, can influence the implementation of a very large scale integration (VLSI) microsystem.
Abstract: This paper examines how architecture, the definition of the instruction set and other facilities that are available to the user, can influence the implementation of a very large scale integration (VLSI) microsystem. The instruction set affects the system implementation in a number of direct ways. The instruction formats determine the complexity of instruction decoding. The addressing modes available determine not only the hardware needed (multiported register files or three-operand adders), but also the complexity of the overall machine pipeline as greater variability is introduced in the time it takes to obtain an operand. Naturally, the actual operations specified by the instructions determine the hardware needed by the execution unit. In a less direct way, the architecture also determines the memory bandwidth required. A few key parameters are introduced that characterize the architecture and can be simply obtained from a typical workload. These parameters are used to analyze the memory bandwidth required and indicate whether the system is CPU- or memory-limited at a given design point. The implications of caches and virtual memories are also briefly considered.

Proceedings ArticleDOI
T. Nukiyama1, T. Kusano, K. Matsumoto, H. Kurokawa, T. Hoshi, H. Goto, T. Temma 
01 Jan 1984
TL;DR: A 6.9mm×7mm chip with 115K transistors, produced in 1.75μm E/D NMOS technology, enables a 3×3 convolution of a 512 × 512 gray image to be performed in 3 seconds by one chip or 1:1 seconds using three cascaded chips.
Abstract: A 6.9mm×7mm chip with 115K transistors, produced in 1.75μm E/D NMOS technology, will be covered. Data flow architecture with a 10MHz clock rate enables a 3×3 convolution of a 512 × 512 gray image to be performed in 3 seconds by one chip or 1:1 seconds using three cascaded chips.

DOI
01 Jan 1984
TL;DR: The architectural specification of UNC is given, and justifies the specification by some application examples, which makes it possible to implement a variety of highperformance processing elements with much reduced package counts.
Abstract: The link and interconnection chip (LINC) is a custom chip whose function is to serve as an efficient link between system functional modules, such as arithmetic units, register files, and I/O ports. LINC has 4-bit datapaths consisting of an 8x8 crossbar interconnection, a FIFO or programmable delay for each of its inputs, and a pipeline register file for each of its outputs. Using pre-stored control patterns LINC can configure its interconnection and delays on-the-fly, while running. Therefore the usual functions of buses and register files can be realized with this single chip. LINC can be used in a bit-sliced fashion to form interconnections with datapaths wider than 4 bits. Moreover, by tri-stating the proper data output pins, multiple copies of LINC can form crossbar interconnections larger than 8x8. Operating at the target cycle time of 100 ns, LINC makes it possible to implement a variety of highperformance processing elements with much reduced package counts. This reduction of chip counts is especially significant for cost-effective implementations of those multiprocessors such as systolic arrays which call for large numbers of processing elements. This paper gives the architectural specification of UNC, and justifies the specification by some application examples.

Journal Article
TL;DR: In this paper, the authors present des considerations technologiques et architecturales dans les realisations de superordinateurs a haute vitesse, convergeant vers une architecture de traitement parallele et pipeline.
Abstract: Presentation des considerations technologiques et architecturales dans les realisations de superordinateurs a haute vitesse, convergeant vers une architecture de traitement parallele et pipeline

Proceedings ArticleDOI
01 Dec 1984
TL;DR: The pipeline FFT implementation is explained and attention is focused on the current activity which involves developing a fixed point arithmetic version using CMOS multipliers and adders to reduce the power consumption.
Abstract: This paper describes recent progress in the implementation of a high speed Fast Fourier Transform (FFT) processor with state-of-the-art VLSI circuits Initial efforts have produced FFT and inverse FFT processors that operate at data rates of up to 40 MHz (complex) The current implementation computes transforms of up to 16,384 points in length by means of the McClellan and Purdy radix 4 pipeline FFT algorithm The arithmetic is performed by single chip 22 bit floating point adders and multipliers, while the interstage reordering is performed by delay commutators implemented with semi-custom VLSI This paper explains the pipeline FFT implementation and focuses attention on our current activity which involves developing a fixed point arithmetic version using CMOS multipliers and adders to reduce the power consumption

Patent
Hiroyuki Izumisawa1
27 Sep 1984
TL;DR: In this article, a vector operation processing apparatus utilizes a plurality of vector registers in a pipeline computer architecture to store ordered data elements which are processed in a pipelined vector operation unit in response to a vector instruction which designates selected ones of the vector registers.
Abstract: A vector operation processing apparatus utilizes a plurality of vector registers in a pipeline computer architecture. The vector registers store ordered data elements which are processed in a pipelined vector operation unit in response to a vector instruction which designates selected ones of the vector registers. An input selection circuit is utilized for writing results of the operation performed by the pipelined vector operation unit into the vector registers which are designated by the vector instruction. A write control device is used for causing the writing operation to be performed exclusively on the plurality of vector registers in response to an indication by the vector instruction.