scispace - formally typeset
Search or ask a question

Showing papers on "Pipeline (computing) published in 1987"


Journal ArticleDOI
TL;DR: The application of a genetic algorithm to the steady state optimization of a serial liquid pipeline is considered and computer results show surprising speed as near-optimal results are obtained after examining a small fraction of the search space.
Abstract: The application of a genetic algorithm to the steady state optimization of a serial liquid pipeline is considered. Genetic algorithms are search procedures based upon the mechanics of natural genet...

264 citations


Patent
26 Aug 1987
TL;DR: In this paper, the authors present an apparatus for transferring results in the result register to the second unit, where a plurality of registers connected to the register, each storing the result from at least one flow of the first pipeline and storing control information is provided.
Abstract: In a pipeline data processing machine having a first unit for execution of instructions running according to a first pipeline and a second unit for storing data from a plurality of ports running according to a second pipeline, the first unit having a result register for holding results including data and address information of a flow of the first pipeline, the present invention provides an apparatus for transferring results in the result register to the second unit. A plurality of registers connected to the result register, each storing the result from at least one flow of the first pipeline and storing control information is provided. Further, a controller in communication with the second unit and the plurality of ports responsive to the control information and a flow of the second pipeline is included for selecting one of the plurality of ports in a first-in, first-out queue as a port to the second unit and for updating the control information.

204 citations


Patent
13 Nov 1987
TL;DR: In this article, a multinode parallel-processing computer is made up of a plurality of inner-connected, large capacity nodes each including a reconfigurable pipeline of functional units such as Integer Arithmetic Logic Processors, Floating Point Arithmetic Processors and Special Purpose Processors.
Abstract: A multinode parallel-processing computer is made up of a plurality of innerconnected, large capacity nodes each including a reconfigurable pipeline of functional units such as Integer Arithmetic Logic Processors, Floating Point Arithmetic Processors, Special Purpose Processors, etc. The reconfigurable pipeline of each node is connected to a multiplane memory by a Memory-ALU switch NETwork (MASNET). The reconfigurable pipeline includes three (3) basic substructures formed from functional units which have been found to be sufficient to perform the bulk of all calculations. The MASNET controls the flow of signals from the memory planes to the reconfigurable pipeline and vice versa. the nodes are connectable together by an internode data router (hyperspace router) so as to form a hypercube configuration. The capability of the nodes to conditionally configure the pipeline at each tick of the clock, without requiring a pipeline flush, permits many powerful algorithms to be implemented directly.

187 citations


Proceedings ArticleDOI
01 Jun 1987
TL;DR: In this paper, a new method of implementing branch instructions is presented, called Branch Folding, which can reduce the apparent number of instructions needed to execute a program by the number of branches in that program, as well as eliminating pipeline breakage.
Abstract: A new method of implementing branch instructions is presented. This technique has been implemented in the CRISP Microprocessor. With a combination of hardware and software techniques the execution time cost for many branches can be effectively reduced to zero. Branches are folded into other instructions, making their execution as separate instructions unnecessary. Branch Folding can reduce the apparent number of instructions needed to execute a program by the number of branches in that program, as well as reducing or eliminating pipeline breakage. Statistics are presented demonstrating the effectiveness of Branch Folding and associated techniques used in the CRISP Microprocessor.

129 citations


Patent
04 Nov 1987
TL;DR: In this paper, a pipeline of polygon processors coupled in series is used for representing 3D objects on a monitor, with each polygon having its position determined by the first scan line on which it appears.
Abstract: A graphic processing system for representing three-dimensional objects on a monitor which uses a pipeline of polygon processors coupled in series. The three-dimensional objects are converted into a group of two-dimensional polygons. These polygons are then sorted to put them in scan line order, with each polygon having its position determined by the first scan line on which it appears. Before each scan line is processed, the descriptions of the polygons beginning on that scan line are sent into a pipeline of polygon processors. Each polygon processor accepts one of the polygon descriptions and stores it for comparison to the pixels of that scan line which are subsequently sent along the polygon processor pipeline. For each new scan line, polygons which are no longer covered are eliminated and new polygons are entered into the pipe. After each scan line is processed, the pixels can be sent directly to the CRT or can be stored in a frame buffer for later accessing. Two polygon processor pipelines can be arranged in parallel to process two halves of a display screen, with one pipeline being loaded while the other is processing. A frame buffer and frame buffer controller are provided for overflow conditions where two passes through the polygon pipeline are needed. A unique clipping algorithm forms a guardband space around a viewing space and clips only polygons intersecting both shells. Extra areas processed are simply not displayed.

111 citations


Journal ArticleDOI
01 Oct 1987
TL;DR: The architecture and implementation of the ZS-1 central processor is described, beginning with some of the basic design objectives, and descriptions of the instruction set, pipeline structure, and virtual memory implementation demonstrate the methods used to satisfy the objectives.
Abstract: The Astronautics ZS-1 is a high speed, 64-bit computer system designed for scientific and engineering applications. The ZS-1 central processor uses a decoupled architecture, which splits instructions into two streams---one for fixed point/memory address computation and the other for floating point operations. The two instruction streams are then processed in parallel. Pipelining is also used extensively throughout the ZS-1.This paper describes the architecture and implementation of the ZS-1 central processor, beginning with some of the basic design objectives. Descriptions of the instruction set, pipeline structure, and virtual memory implementation demonstrate the methods used to satisfy the objectives. High performance is achieved through a combination of static (compile-time) instruction scheduling and dynamic (run-time) scheduling. Both types of scheduling are illustrated with examples.

101 citations


Patent
Shibuya Toshiteru1
05 Jan 1987
TL;DR: In this article, a data processing system capable of processing instructions under pipeline control in a plurality of stages including an executing stage, an instruction prefetching device comprises a prediction checking circuit (66, 67) coupled to a predicting circuit (52, 53) and an instruction executing circuit (32, 33, 37, 38) and a prefetch controlling circuit (47, 86).
Abstract: In a data processing system capable of processing instructions under pipeline control in a plurality of stages including an executing stage, an instruction prefetching device comprises a prediction checking circuit (66, 67) coupled to a predicting circuit (52, 53) and an instruction executing circuit (32, 33, 37, 38) and a prefetch controlling circuit (47, 86) coupled to the predicting circuit and the checking circuit. In one of the stages that is prior to the executing stage, the checking circuit (66, 67) checks whether or not a prediction for a branch destination is correct. if the prediction is correct, prefetch is continued according to the prediction. If the prediction is an incorrect prediction, the prefetch is continued according to a correct prediction with the incorrect prediction corrected immediately after the executing stage. Check of the prediction may be for an instruction other than branch instructions, for either an unconditional branch instruction or a branch count instruction, for a branch destination address, or for a branch direction which becomes clear after the executing stage.

88 citations


Proceedings ArticleDOI
01 Jun 1987
TL;DR: This paper examines the design of a second generation VLSI RISC processor, MIPS-X, and examines several key areas, including the organization of the on-chip instruction cache, the coprocessor interface, branches and the resulting branch delay, and exception handling.
Abstract: The design of a RISC processor requires a careful analysis of the tradeoffs that can be made between hardware complexity and software As new generations of processors are built to take advantage of more advanced technologies, new and different tradeoffs must be considered We examine the design of a second generation VLSI RISC processor, MIPS-XMIPS-X is the successor to the MIPS project at Stanford University and like MIPS, it is a single-chip 32-bit VLSI processor that uses a simplified instruction set, pipelining and a software code reorganizer However, in the quest for higher performance, MIPS-X uses a deeper pipeline, a much simpler instruction set and achieves the goal of single cycle execution using a 2-phase, 20 MHz clock This has necessitated the inclusion of an on-chip instruction cache and careful consideration of the control of the machine Many tradeoffs were made during the design of MIPS-X and this paper examines several key areas They are: the organization of the on-chip instruction cache, the coprocessor interface, branches and the resulting branch delay, and exception handling For each issue we present the most promising alternatives considered for MIPS-X and the approach finally selected Working parts have been received and this gives us a firm basis upon which to evaluate the success of our design

78 citations


Journal ArticleDOI
TL;DR: This paper proposes applying an old but rarely used architectural approach to the design of single-chip signal processors so that the potential benefits of extensive pipelining can be fully realized.
Abstract: Programmable processors specialized to intensive numerical computation and real-time signal processing are often deeply pipelined. The ensuing gain in throughput is moderated by the difficulty of efficiently programming such processors. Techniques for overcoming this difficulty are effective only for modest amounts of pipelining. This paper proposes applying an old but rarely used architectural approach to the design of single-chip signal processors so that the potential benefits of extensive pipelining can be fully realized. The architectural approach is to interleave multiple processes (or programs) through a single deeply pipelined processor in such a way that the disadvantages of deep pipelining disappear. Instead, the user is faced with the need to construct programs that can execute as concurrent processes. The main advantage is that much more pipelining can be used without aggravating the programming. A specific experimental architecture is outlined. The solution offered is a "system solution" in that architectural performance is considered along with programmability and ease of use. In the companion paper, data flow programming is suggested so that algorithms can be automatically partitioned for concurrent execution. Data flow provides a natural environment in which to build signal processing programs and can be supported efficiently in an architecture of the type described here.

62 citations


Patent
Iwasaki Junichi1, Hisao Harigai1
20 Apr 1987
TL;DR: In this article, the output of a status flip-flop inside a microprocessor is edited by discriminating the bus cycle of the microprocessor following an instruction on or before the predetermined instruction.
Abstract: A microprocessor having a multi-stage pipeline structure, comprises: a status flip-flop having its output changing when the instruction code of a predetermined instruction is decoded in the microprocessor; a circuit for outputting the output of the status flip-flop in synchronism with the output timing of an address for the bus cycle period of the microprocessor; and a circuit for sequentially storing the information, which appears at the input/output terminals of the microprocessor, as time-series data outside of the microprocessor. The time-series data is edited by discriminating the bus cycle of the microprocessor belongs to the bus cycle following an instruction on or before the predetermined instruction for changing the output of the status flip-flop or the bus cycle following an instruction on or after the predetermined instruction, with reference to the information outputted from the status flip-flop inside of the microprocessor to the outside of the same.

61 citations


Journal ArticleDOI
TL;DR: In this two-paper series, techniques connected with artificial intelligence and genetics are applied to achieve computer-based control of gas pipeline systems to solve two classical pipeline optimization problems, the steady serial line problem, and the single transient line problem.
Abstract: In this two-paper series, techniques connected with artificial intelligence and genetics are applied to achieve computer-based control of gas pipeline systems. In this, the first paper, genetic algorithms are developed and applied to the solution of two classical pipeline optimization problems, the steady serial line problem, and the single transient line problem. Simply stated, genetic algorithms are canonical search procedures based on the mechanics of natural genetics. They combine a Darwinian survival of the fittest with a structured, yet randomized, information exchange between artificial chromosomes (strings). Despite their reliance on stochastic processes, genetic algorithms are no simple random walk; they carefully and efficiently exploit historic information to guide future trials. In the two pipeline problems, a simple three-operator genetic algorithm consisting of reproduction, crossover, and mutation finds near-optimal performance quickly. In, the steady serial problem, near-optimal performance is found after searching less than 1100 of 1.1(1012) alternatives. Similarly, efficient performance is demonstrated in the transient problem. Genetic algorithms are ready for application to more complex engineering optimization problems. They also can serve as a searning mechanism in a larger rule learning procedure. This application is discussed in the sequal.

Journal ArticleDOI
TL;DR: This paper presents a set of trace reductions that facilitates the task of accurately characterizing a trace tape with simple statistics by simplifying the corresponding data-dependency graph, and shows that the reduced graph can be accurately characterized by simple statistics.
Abstract: The nature by which branches and data dependencies generate delays that degrade pipeline performance is investigated in this paper. We show that for the general execution trace, few specific delays can be considered in isolation; rather, the magnitude of any specific delay may depend on the relative proximity of other delays. This phenomenon can make the task of accurately characterizing a trace tape with simple statistics intractable. We present a set of trace reductions that facilitates this task by simplifying the corresponding data-dependency graph. The reductions operate on multiple data-dependency arcs and branches in conjunction; those arcs whose performance implications are redundant with respect to the dependency graph are identified, and eliminated from the graph. We show that the reduced graph can be accurately characterized by simple statistics. We use these statistics to show that as the length of a pipeline increases, the performance degradation due to data dependencies and branches increases monotonically. However, lengthening the pipeline may correspond to decreasing the cycle time of the pipeline. These two opposing effects are used in conjunction to derive an equation for optimal pipeline length for a given trace tape. The optimal pipeline length is shown to be characterized by n = √γα where γ is the ratio of overall circuit delay to latching overhead, and a is a function of the trace statistics that accounts for the delays induced by data dependencies and branches.

Patent
08 Apr 1987
TL;DR: In this article, a method and apparatus for determining the water content of crude oil in a pipeline is described, which consists of S-band tarnsmitting and receiving antennas, and X-band transmitting and receiving antenna.
Abstract: A method and apparatus for determining the water content of crude oil in a pipeline is disclosed. The device consists of S-band tarnsmitting and receiving antennas, and X-band transmitting and receiving antennas. These are used to determine the complex dielectric constant of the fluid in a pipeline. Water salinity and an adjustment to the mixing formula are calculated using X-band and S-band sidewall links. The overall water content of the pipeline can then be determined by using as S-band main link that transmits a wave through a representative portion of the entire pipeline.

Patent
29 Sep 1987
TL;DR: In this paper, a data processing system is described in which the available technology is used to provide high performance by having a four-level pipeline for the central processing system, a simplified instruction set and an interface with the coprocessor unit.
Abstract: A data processing system is described in which the available technology is used to provide high performance. The high performance is achieved by having a four-level pipeline for the central processing system, a simplified instruction set and an interface with the coprocessor unit that has a simple and efficient interface with the normal instruction execution. The apparatus implementing the central processing system is closely connected to the instruction set. A discussion of the implementation of the data processing system is provided.

Journal ArticleDOI
TL;DR: Algorithms for computing image transforms and features such as projections along linear patterns, convex hull approximations, Hough transform for line detection, diameter, moments, and principal components suitable for implementation in image analysis pipeline architectures are presented.
Abstract: In this correspondence, some image transforms and features such as projections along linear patterns, convex hull approximations, Hough transform for line detection, diameter, moments, and principal components will be considered. Specifically, we present algorithms for computing these features which are suitable for implementation in image analysis pipeline architectures. In particular, random access memories and other dedicated hardware components which may be found in the implementation of classical techniques are not longer needed in our algorithms. The effectiveness of our approach is demonstrated by running some of the new algorithms in conventional short-pipelines for image analysis. In related papers, we have shown a pipeline architecture organization called PPPE (Parallel Pipeline Projection Engine), which unleashes the power of projection-based computer vision, image processing, and computer graphics. In the present correspondence, we deal with just a few of the many algorithms which can be supported in PPPE. These algorithms illustrate the use of the Radon transform as a tool for image analysis.

Proceedings ArticleDOI
01 Jun 1987
TL;DR: The WISQ architecture is described, designed to achieve high performance by exploiting new compiler technology and using a highly segmented pipeline, and ways to further reduce the effects of branches by not having them executed in the execution unit are studied.
Abstract: In this paper, the WISQ architecture is described. This architecture is designed to achieve high performance by exploiting new compiler technology and using a highly segmented pipeline. By having a highly segmented pipeline, a very-high-speed clock can be used. Since a highly segmented pipeline will require relatively long pipelines, a way must be provided to minimize the effects of pipeline bubbles that are formed due to data and control dependencies. It is also important to provide a way of supporting precise interrupts. These goals are met, in part, by providing a reorder buffer to help restore the machine to a precise state. The architecture then makes the pipelining visible to the programmer/compiler by making the reorder buffer accessible and by explicitly providing that issued instructions cannot be affected by immediately preceding ones. Compiler techniques have been identified that can take advantage of the reorder buffer and permit a sustained execution rate approaching or exceeding one per clock. These techniques include using trace scheduling and providing a relatively easy way to “undo” instructions if the predicted branch path is not taken. We have also studied ways to further reduce the effects of branches by not having them executed in the execution unit. In particular, branches are detected and resolved in the instruction fetch unit. Using this approach, the execution unit is sent a stream of instructions (without branches) that are guaranteed to execute.

Patent
06 Jul 1987
TL;DR: In this paper, a method for producing pipeline having a thermally insulating coating in which a continuous matrix of water-impermeable material has dispersed throughout it hollow microspheres or cellular particles which improve on heat insulating properties of the basic matrix is presented.
Abstract: A method for producing pipeline having a thermally insulating coating in which a continuous matrix of water-impermeable material has dispersed throughout it hollow microspheres or cellular particles which improve on heat insulating properties of the basic matrix

Patent
13 Oct 1987
TL;DR: An asynchronous form of pipeline processor has been proposed in this article, where a control unit individually provides binary control signals to a plurality of processing apparatus to maintain order with respect to processing and storing data.
Abstract: An asynchronous form of pipeline processor has a storage capability for partially processed data. When the processor is empty, it functions as a combinatorial circuit producing resultant data processed as desired. As necessary, the processor registers data, however, it continues to advance other data as rapidly as possible. A control unit individually provides binary control signals to a plurality of processing apparatus to maintain order with respect to processing and storing data. Switching structures controlled by the control unit along with amplifiers are provided at the input and output of individual processors to set the processing apparatus to process or store data. Processing several sets of data simultaneously while preserving proper order enables the system to do logic and arithmetic processing at a relatively high speed.

Patent
27 May 1987
TL;DR: In this article, an interlock of an instruction processing pipeline in a data processing system responsive to the validity of the pipeline stages within the instruction unit pipeline under microprogram control, is provided.
Abstract: An interlock of an instruction processing pipeline in a data processing system responsive to the validity of the pipeline stages within the instruction unit pipeline under microprogram control, is provided. Thus, a microprogram can provide for the release of a particular pipeline stage based on a selected characteristic of the valid signals generated by other stages of the pipeline. An interlock control signal is generated by a decode of a field in a microinstruction stored in a control store RAM or through hardwired decoding.

Patent
28 Dec 1987
TL;DR: In this article, control information in an instruction that the fetch of the memory operand is not required does not pass through the pipeline stage relating to the fetch, thereby improving bus band width for memory accesses.
Abstract: A central processing unit includes an instruction decoder (1), an operand address computation unit (2), an operand pre-fetch unit (3), a control information buffer (5), an arithmetic unit (4), an instruction fetch unit (6), a chip bus (7), and a bus controller (8). A process relating to the fetch of a memory operand is independent from main pipeline process having an instruction fetching stage, an instruction decoding stage, and an instruction execution stage. As a result, control information (13) in an instruction that the fetch of the memory operand is not required does not pass through the pipeline stage relating to the fetch of the memory operand thereby improving bus band width for memory operand accesses.

Journal ArticleDOI
TL;DR: Together, the learning classifier system with its complete rule and message system and powerful learning heuristic is capable of learning how to operate a pipeline under normal and abnormal conditions alike.
Abstract: In this two-paper series, techniques connected with artificial intelligence and genetics are applied to the problem of gas pipeline control. In the first paper, genetic algorithms were applied to two pipeline optimization problems. In this, the second paper, genetic algorithms are used as a basic learning mechanism in a larger rule learning system called a learning classifier system. The learning classifier system is developed and applied to the control of a gas pipeline under normal summer and winter operations as well as abnormal operations during leak events.

Journal ArticleDOI
TL;DR: It is shown how the edge computations in these two algorithms can be restructured into the form of a single, shared hardware pipeline, and data from a simulation of this processor suggests that the scanline processor can reduce computation time significantly.
Abstract: This paper proposes an architecture to support VLSI geometry checking tasks based on scanline algorithms. Rather than recast the entire verification task in hardware, we identify primitives around which geometry checking tools can be built, and examine the feasibility of accelerating two of these critical primitives. We focus on the operations of Boolean combinations of mask layers, and region numbering within a mask layer. Unlike previous proposals for special hardware (e.g., bit map processors), this architecture operates on a more realistic representation of masks: a sorted stream of possibly oblique edges. The architecture can be viewed as directly interpreting the operators that manipulate the relevant scanline data structures. We show how the edge computations in these two algorithms can be restructured into the form of a single, shared hardware pipeline. Data from a simulation of this processor suggests that, relative to the specific software functions it is intended to replace, the scanline processor can reduce computation time significantly. In particular, simulations of one possible implementation for this processor yield speedups of three orders of magnitude for Manhattan mask data, degrading gracefully to speedups of two orders of magnitude for highly oblique mask data.

Patent
Tadaaki Isobe1, Isobe Toshiko1
16 Nov 1987
TL;DR: An access instruction pipeline for receiving an access instruction for accessing data to be inputted to the pipeline of a vector processor includes a plurality of buffers for buffering a memory request and sending it to a storage control unit.
Abstract: An access instruction pipeline for receiving an access instruction for accessing data to be inputted to the pipeline of a vector processor includes a plurality of buffers for buffering a memory request and sending it to a storage control unit, and a detector for judging at the last stage of the plurality of buffers if an instruction is an access instruction or a serialization instruction for serializing the memory access instructions among access instruction pipelines. If a serialization instruction is detected at the last stage of a pipeline, the pipelining operation is stopped, but instructions are filled up in the stopped pipeline. After a serialization instruction has been detected at the last stage of all the pipeline, a pipelining operation starts again.

Journal ArticleDOI
J. Sanz1, E. Hinkle
TL;DR: This paper proposes some new pipeline configurations which achieve a remarkable degree of parallelism in the computation of projection data and, in fact, of many other geometrical descriptors of digital images.
Abstract: This paper deals with the problem of computing projections of digital images. The novelty of our contribution is that we present algorithms which are suitable for implementation in general purpose image processing and image analysis pipeline architectures. No random access of the image memory is necessary. We propose some new pipeline configurations which achieve a remarkable degree of parallelism in the computation of projection data and, in fact, of many other geometrical descriptors of digital images. Fast computation of projections of digital images is not only important for extracting geometrical information from images, it also makes possible performing a large number of operations on images in Radon space, thereby reducing two-dimensional problems to a series of one-dimensional problems.

Patent
16 Jul 1987
TL;DR: In this paper, the authors propose to confirm that a desired pipeline is constituted by providing a connecting means and a connection confirming means in each module in each pipeline, where each module receives the connection confirming signal from the designated output bus D, and sends back the connected confirming signal to the designated input bus D and executes this connection confirming operation in a chain.
Abstract: PURPOSE:To confirm that a desired pipeline is constituted by providing a connecting means and a connection confirming means in each module. CONSTITUTION:Input and output buses R, W are designated to a desired module M. Before starting pipeline processing, if its own module is in the ready state and designated input bus D is active, operation of outputting an active signal to the designated output bus is successively executed from the leading module M, and desired modules M are connected in a chain. The connecting operation is inverted when arrived at the final module M and switched to forming operation that outputs a connection confirming signal to the output bus D. Each module M receives the connection confirming signal from the designated output bus D, and sends back the connection confirming signal to the designated input bus D, and executes this connection confirming operation in a chain. Establishment of desired pipeline is confirmed by the fact that the connection confirming signal is outputted from the leading module M, and pipeline processing is started.

Journal ArticleDOI
TL;DR: The implementation and architecture of a 172, 163-transistor single-chip general-purpose 32-b microprocessor is described, which is capable of a peak execution rate of over one instruction/clock.
Abstract: The implementation and architecture of a 172, 163-transistor single-chip general-purpose 32-b microprocessor is described. The 16-MHz chip is fabricated using a single-metal double-poly 1.75-/spl mu/m CMOS technology and is capable of a peak execution rate of over one instruction/clock. Multiple on-chip catches, pipelining, and a one-cycle I/O protocol are utilized.

Proceedings ArticleDOI
01 Jan 1987
TL;DR: A Reduced Instruction Set Computer with a 5-stage pipeline implemented with 150K transistors on an 8mm×8.5mm chip in a 2μm, 2 layer metal CMOS process, a 12MIPS performance has been achieved.
Abstract: A Reduced Instruction Set Computer with a 5-stage pipeline implemented with 150K transistors on an 8mm×8.5mm chip in a 2μm, 2 layer metal CMOS process, will be reported. At operational frequency of 20MHz, a 12MIPS performance has been achieved.

Proceedings ArticleDOI
18 May 1987
TL;DR: A pipeline networking approach to designing a Chebyshev polynomial evaluator for the fast evaluation of elementary functions over a string of arguments is presented.
Abstract: Fast evaluation of vector-valued elementary functions plays a vital role in many real-time applications. In this paper, we present a pipeline networking approach to designing a Chebyshev polynomial evaluator for the fast evaluation of elementary functions over a string of arguments. In particular, pipeline nets are employed to perform the preprocessing and postprocessing of various elementary functions to boost the overall system performance. Design tradeoffs are analyzed among representational accuracy, processing speed and hardware complexity.

Proceedings ArticleDOI
01 Dec 1987
TL;DR: A dynamic mode data-driven execution-scheme with special emphasis on general design considerations for VLSI-oriented implementation is presented and the Q-v1's basic self-timed elastic data-transfer mechanism gives rise to a unique “flow-thru processing” concept.
Abstract: This paper describes the VLSI design considerations and basic hardware structure of a one-chip data-driven processor Q-v1 which can process high flow-rate data-streams in a dynamic mode of execution. Since the Q-v1 is primarily designed to be a functional VLSI component that is easily programmable to perform various dedicated processing functions, special design considerations were used to realize high on-chip data-flow capability by extensive utilization of an elastic pipeline structure. This paper first presents a dynamic mode data-driven execution-scheme with special emphasis on general design considerations for VLSI-oriented implementation. This paper also presents the Q-v1's basic self-timed elastic data-transfer mechanism. Its application to various functional modules gives rise to a unique “flow-thru processing” concept. That is, all processing, associative and selective packet-transfer functions are carried out in a highly parallel fashion by the elastic packetized data-flows through distributively and autonomously controlled elastic pipelines.

Book ChapterDOI
01 Jan 1987
TL;DR: The general development philosophy is described for the TX series which consists of a basic core processor, higher performance ones and superintegrated autonomous derivative processors, which are designed on the single TRONCHIP architecture.
Abstract: The general development philosophy is described for our TX series which consists of a basic core processor, higher performance ones and superintegrated autonomous derivative processors. All these processors are designed on the single TRONCHIP architecture. The core processor TX1 is designed to be widely used for controllers of highly intelligent machines. The TX1 pipeline structure and its performance simulation are discussed intensively, which endorse more than five MIPS. The higher performance processor TX3 contains a memory management unit and 16K byte cache memory on chip and achieves over ten MIPS including basic floating-point instructions. As the first example of TX series superintegration, an organization of LAN processor is discussed which integrates a Token.Ring controller logic, high speed RAM and TX1 as a network processor. Lastly, our basic idea is described for the application support systems which include a real-time OS nucleus.