Showing papers on "Pipeline (computing) published in 1994"

PDF

Open Access

Journal Article•DOI•

The counterflow pipeline processor architecture

[...]

Robert F. Sproull, Ivan E. Sutherland, Charles E. Molnar

01 Apr 1994-IEEE Design & Test of Computers

TL;DR: The CFPP architecture and a proposal for an asynchronous implementation are presented and the architecture seeks geometric regularity in processor chip layout, purely local control to avoid performance limitations of complex global pipeline stall signals, and simplicity that might lead to provably correct processor designs.

...read moreread less

Abstract: The counterflow pipeline processor architecture (CFPP) is a proposal for a family of microarchitectures for RISC processors. The architecture derives its name from its fundamental feature, namely that instructions and results flow in opposite directions within a pipeline and interact as they pass. The architecture seeks geometric regularity in processor chip layout, purely local control to avoid performance limitations of complex global pipeline stall signals, and simplicity that might lead to provably correct processor designs. Moreover, CFPP designs allow asynchronous implementations, in contrast to conventional pipeline designs where the synchronization required for operand forwarding makes asynchronous designs unattractive. This paper presents the CFPP architecture and a proposal for an asynchronous implementation. Detailed performance simulations of a complete processor design are not yet available.

...read moreread less

180 citations

Patent•

Microprocessor with an external command mode for diagnosis and debugging

[...]

Robert S. Dreyer¹, Donald B. Alpert¹, Nimish H Modi¹, Mike J Tripp¹•Institutions (1)

Intel¹

21 Oct 1994

TL;DR: In this paper, an external command mode for directly accessing the execution unit, responsive to externally generated commands and instructions, is presented, where the user can examine and modify registers, memory, and I/O space without otherwise affecting their contents.

...read moreread less

Abstract: A microprocessor is disclosed herein having an external command mode for directly accessing the execution unit, responsive to externally generated commands and instructions. An external instruction path is provided, as well as a conventional processor-driven instruction path. A multiplexer is provided that selects which of the instruction paths is actually supplied to the execution unit. Using the external command mode, the user can examine and modify registers, memory, and I/O space without otherwise affecting their contents. Any instruction executable by the execution unit is executable in the external command mode. Because direct access is provided into the execution unit, there is no implicit updating that would otherwise affect the state of the processor and require saving to an alternate memory. The present invention is implemented with a conventional test access port designed in accordance with the IEEE 1149.1 boundary scan standard, with modification to include an instruction register, a data register, and control logic. The external command mode is applicable to single and multiple pipeline processors. The circuit described herein includes several selectors for selecting between the probe mode and the processor-driven mode of operation, including an external pin, an external command, and a debug exception. For ascertaining if the circuit is in the external command mode, an acknowledge pin is provided to indicate when the execution unit is ready to accept an instruction in the probe model.

...read moreread less

173 citations

Journal Article•DOI•

A 200 MHz 13 mm/sup 2/ 2-D DCT macrocell using sense-amplifying pipeline flip-flop scheme

[...]

M. Matsui¹, Hiroyuki Hara², Y. Uetani², Lee-Sup Kim³, Tetsu Nagamatsu², Yohji Watanabe², A. Chiba², K. Matsuda², Takayasu Sakurai² - Show less +5 more•Institutions (3)

Stanford University¹, Toshiba², KAIST³

01 Dec 1994

TL;DR: The implementation of a 200 MHz 13.3 mm/sup 2/ 8/spl times/8 2-D DCT macrocell capable of HDTV rates, based on a direct realization of the DCT, and using distributed arithmetic is presented.

...read moreread less

Abstract: The two-dimensional discrete cosine transform (2D DCT) has been widely recognized as a key processing unit for image data compression/decompression. In this paper, the implementation of a 200 MHz 13.3 mm/sup 2/ 8/spl times/8 2-D DCT macrocell capable of HDTV rates, based on a direct realization of the DCT, and using distributed arithmetic is presented. The macrocell, fabricated using 0.8 /spl mu/m base-rule CMOS technology and 0.5 /spl mu/m MOSFET's, performs the DCT processing with 1 sample-(pixel)-per-clock throughput. The fast speed and small area are achieved by a novel sense-amplifying pipeline flip-flop (SA-F/F) circuit technique in combination with nMOS differential logic. The SA-F/F, a class of delay flip-flops, can be used as a differential synchronous sense-amplifier, and can amplify dual-rail inputs with swings lower than 100 mV. A 1.6 ns 20 bit carry skip adder used in the DCT macrocell, which was designed by the same scheme, is also described. The adder is 50% faster and 30% smaller than a conventional CMOS carry look ahead adder, which reduces the macrocell size by 15% compared to a conventional CMOS implementation. >

...read moreread less

156 citations

Proceedings Article•DOI•

Behavioral synthesis for low power

[...]

Anand Raghunathan¹, Niraj K. Jha¹•Institutions (1)

Princeton University¹

10 Oct 1994

TL;DR: A behavioral synthesis method targeting low power consumption for data-dominated CMOS circuits by considering loops, conditional branches, and scheduling constructs such as multicycling, chaining and structural pipelining is presented.

...read moreread less

Abstract: We present a behavioral synthesis method targeting low power consumption for data-dominated CMOS circuits. A study of how the high-level synthesis process affects power consumption is presented, based on, which we have developed the first allocation method for low power. We also present a method of optimizing the controller to reduce data path power dissipation. We consider loops, conditional branches, and scheduling constructs such as multicycling, chaining and structural pipelining. The techniques were implemented within the framework of an existing behavioral synthesis system. Experiments performed on various examples and benchmarks show that low power circuits can be synthesized by our method with very low or zero overheads. >

...read moreread less

151 citations

Patent•

Concurrent fault simulation of circuits with both logic elements and functional circuits

[...]

Prathima Agrawal¹, Soumitra Bose¹•Institutions (1)

AT&T¹

19 Dec 1994

TL;DR: In this article, test vectors for a circuit containing both logic gates and memory blocks are evaluated by applying candidate test vectors to good and faulty versions of the circuit in a computer simulation.

...read moreread less

Abstract: Test vectors for a circuit containing both logic gates and memory blocks are evaluated by applying candidate test vectors to good and faulty versions of the circuit in a computer simulation. The functions of the gates and interconnections in the circuit are stored in memory and the operation of the good and faulty circuits is simulated concurrently. During the simulation, a memory record is created for storing the state of a circuit element in a faulty circuit if the fault is visible at the element. Such records are removed when no longer needed, which speeds up the simulation. A multiprocessor in a pipeline configuration is disclosed for performing the simulation. A first branch in the pipeline simulates the logic gates in the circuit; a second branch simulates the memory blocks.

...read moreread less

136 citations

Patent•

Software scheduled superscalar computer architecture

[...]

Siamak Arya¹, Howard G. Sachs¹•Institutions (1)

Intergraph¹

27 Oct 1994

TL;DR: In this paper, a computing system is described in which groups of individual instructions are executed in parallel by processing pipelines, and instructions to be executed by different pipelines are supplied to the pipelines simultaneously.

...read moreread less

Abstract: A computing system is described in which groups of individual instructions are executable in parallel by processing pipelines, and instructions to be executed in parallel by different pipelines are supplied to the pipelines simultaneously. During compilation of the instructions those which can be executed in parallel are identified. The system includes a register for storing an arbitrary number of the instructions to be executed. The instructions to be executed are tagged with pipeline identification tags and group identification tags indicative of the pipeline to which they should be dispatched, and the group of instructions which may be dispatched during the same operation. The pipeline and group identification tags are used to dispatch the appropriate groups of instructions simultaneously to the differing pipelines.

...read moreread less

129 citations

Patent•

Central processing unit for processing a plurality of threads using dedicated general purpose registers and masque register for providing access to the registers

[...]

Greg Chesson, Inwhan Choi, Yuh-Wen Lin, Jeannine M. Smith, Daniel Yau, Desmond W. Young - Show less +2 more

23 Dec 1994

TL;DR: In this paper, a data stream processing unit comprises a CPU which comprises an ALU, a shift/extract unit, timers, a scheduler, an event system, a plurality of sets of general purpose registers and masquerade registers, pipeline controller, a memory controller and a pair of internal buses.

...read moreread less

Abstract: A data stream processing unit comprises a CPU which comprises an ALU, a shift/extract unit, timers, a scheduler, an event system, a plurality of sets of general purpose registers, a plurality of sets of special purpose registers, masquerade registers, pipeline controller, a memory controller and a pair of internal buses. The multiple sets of general and special purpose registers improves the speed of the CPU in switching between environments. The pipeline controller, the scheduler, the events system, and the masquerade registers facilitate the implementation and execution of the methods of the present invention such as efficient thread scheduling, branch delays, elimination of delay slots after stores that provide further increases in the performance and bandwidth.

...read moreread less

119 citations

Proceedings Article•DOI•

Fast and accurate instruction fetch and branch prediction

[...]

Brad Calder¹, Dirk Grunwald¹•Institutions (1)

University of Colorado Boulder¹

01 Apr 1994

TL;DR: This paper used trace-driven simulation to show that the proposed design, which uses fewer resources, offers better performance than previously proposed alternatives for most programs, and indicates how to further improve this design.

...read moreread less

Abstract: Accurate branch prediction is critical to performance; mispredicted branches mean that ten's of cycles may be wasted in superscalar architectures. Architectures combining very effective branch prediction mechanisms coupled with modified branch target buffers (BTB's) have been proposed for wide-issue processors. These mechanisms require considerable processor resources. Concurrently, the larger address space of 64-bit architectures introduce new obstacles and opportunities. A larger address space means branch target buffers become more expensive. In this paper, we show how a combination of less expensive mechanisms can achieve better performance than BTB's. This combination relies on a number of design choices described in the paper. We used trace-driven simulation to show that our proposed design, which uses fewer resources, offers better performance than previously proposed alternatives for most programs, and indicate how to further improve this design.

...read moreread less

117 citations

Patent•

Processing commands and data in a common pipeline path in a high-speed computer graphics system

[...]

Roger W. Swanson¹•Institutions (1)

Hewlett-Packard¹

31 Mar 1994

TL;DR: In this article, a pipelined processing system with context switching for each of the pipelining processing circuits within the pipeline may be accomplished without flushing the data from the pipeline by sending the pipeline commands and data together through the pipeline and differentiating the commands from the data using a flag added to the commands.

...read moreread less

Abstract: A pipelined processing system in which context switching for each of the pipelined processing circuits within the pipeline may be accomplished without flushing the data from the pipeline. This is accomplished by sending the pipeline commands and data together through the pipeline and differentiating the commands from the data using a flag added to the commands and data which specifies whether the associated data word is a command or data. During operation of the pipeline, when the input data is received by one of the pipelined processing circuits in the pipeline, the flag is checked to see if the associated data word includes a command. If the associated data word includes data to be processed, it is processed in accordance with the current configuration of the pipeline. However, if the associated data word includes a command for setup and control and the like, each pipelined processing circuit within the pipeline compares its identification value with a tag field in the command to determine whether it is to be reconfigured by that command. If it is to be reconfigured by that command, the appropriate context switching and the like takes place. However, if the current pipelined processing circuit is not to be reconfigured by that command, that command is passed through the current pipelined processing circuit unprocessed so that a similar determination may be made by the next pipelined processing circuit in the pipeline. As a result, setup and control commands for the pipelined processing circuits may be passed through the data processing pipeline along with the data in the desired processing order such that a pipeline data flush is not necessary between reconfigurations of the pipelined processing circuits. Since the pipeline need not be flushed when processes are changed, processing efficiency and throughput are substantially improved.

...read moreread less

110 citations

Proceedings Article•DOI•

A 10-bit, 20-MS/s, 35-mW pipeline A/D converter

[...]

Thomas Byunghak Cho¹, Paul R. Gray¹•Institutions (1)

University of California, Berkeley¹

01 May 1994

TL;DR: In this article, a 10-bit 20-MS/s pipeline A/D converter implemented in 1.2-/spl mu/m CMOS technology achieves a power dissipation of 35 mW at full speed operation.

...read moreread less

Abstract: This paper describes a 10-bit 20-MS/s pipeline A/D converter implemented in 1.2-/spl mu/m CMOS technology which achieves a power dissipation of 35 mW at full speed operation. Circuit techniques used to achieve this level of power dissipation include operation on a 3.3 V power supply, optimum scaling of capacitor values through the pipeline, and digital correction to allow the use of dynamic comparators. Measured performance includes 0.6 LSB of INL, 59.1 dB of SNDR for 100 kHz input at 20 MS/s. At Nyquist sampling (10 MHz input), SNDR is 55.0 dB. >

...read moreread less

94 citations

Pipeline synchronization

[...]

J.N. Seizovic

01 Jan 1994

TL;DR: The technique can sustain maximum communication bandwidth while achieving an arbitrarily low, non-zero probability of synchronization failure, P/ sub f, with the price in both latency and chip area being /spl Oscr/(log 1/P/sub f/).

...read moreread less

Abstract: Pipeline synchronization is a simple, low-cost, high-bandwidth, high-reliability solution to interfaces between synchronous and asynchronous systems, or between synchronous systems operating from different clocks. The technique can sustain maximum communication bandwidth while achieving an arbitrarily low, non-zero probability of synchronization failure, P/sub f/, with the price in both latency and chip area being /spl Oscr/(log 1/P/sub f/). Pipeline synchronization has been successfully applied to high-performance inter-computer communication in multicomputers and local-area networks.

...read moreread less

Patent•

Apparatus and method for handling string operations in a pipelined processor

[...]

David B. Papworth¹, Michael A. Fetterman¹, Andrew F. Glew¹, Lawrence O. Smith¹, Michael M. Hancock¹, Beth Schultz¹ - Show less +2 more•Institutions (1)

Intel¹

01 Mar 1994

TL;DR: In this paper, an instruction sequencer issues an instruction that computes the register value minus a pre-determined number of iterations to be issued into the pipeline, followed by the instruction returning with the calculated number.

...read moreread less

Abstract: In a pipelined processor, an apparatus for handling string operations. When a string operation is received by the processor, the length of the string as specified by the programmer is stored in a register. Next, an instruction sequencer issues an instruction that computes the register value minus a pre-determined number of iterations to be issued into the pipeline. Following the instruction, the pre-determined number of iterations are issued to the pipeline. When the instruction returns with the calculated number, the instruction sequencer then knows exactly how many iterations should be executed. Any extra iterations that had initially been issued are canceled by the execution unit, and additional iterations are issued as necessary. A loop counter in the instruction sequencer is used to track the number of iterations.

...read moreread less

Patent•

Block-based branch prediction using a target finder array storing target sub-addresses

[...]

James S. Blomgren, Earl T. Cohen, Brian R. Baird

31 Aug 1994

TL;DR: In this paper, a target finder array in the instruction cache contains a lower portion of the target address and a block encoding indicating if the target addresses are within the same 2K-byte block that the branch instruction is in, or if target address is in the next or previous 2Kbyte block.

...read moreread less

Abstract: A target finder array in the instruction cache contains a lower portion of the target address and a block encoding indicating if the target address is within the same 2K-byte block that the branch instruction is in, or if the target address is in the next or previous 2K-byte block. The upper portion of the target address, its block number, which corresponds to the starting address of a 2K block, is generated from the target finder simply by taking the upper portion or block number of the branch instruction and incrementing and decrementing it, and using the block encoding in the finder to select either the unmodified block number of the branch instruction, or the incremented or decremented block number of the branch instruction. The lower portion of the target address that was stored in the finder is concatenated with the selected block number to get the predicted target address. The target address can be predicted in parallel with reading an instruction out of the cache, making the target available at the same time the branch instruction is available, eliminating pipeline stalls for correctly predicted branches. The initially predicted target address in the finder is generated by a quick decode of the instruction and is written when the cache is loaded from memory. The initial prediction does not have to be accurate because branch resolution logic will update the finder on each branch resolution. Register indirect branches and exceptions may also be predicted. Two instruction sets may be accommodated by different block encodings to indicate the instruction set. By using the block encoding, the finder array is small and inexpensive.

...read moreread less

Journal Article•DOI•

Recent advances in the mitigation of AC voltages occurring in pipelines located close to electric transmission lines

[...]

R.D. Southey, F.P. Dawalibi, W. Vukonich

01 Apr 1994-IEEE Transactions on Power Delivery

TL;DR: In this article, the authors present a new mitigation design approach which not only reduces AC voltages effectively and economically, but also provides cathodic protection for the protected pipeline, which is illustrated with results from computer simulations, which show how important it is to have an accurate electrical model of the soil structure in any interference study.

...read moreread less

Abstract: In joint-use corridors where both pipelines and AC electric transmission lines are present, a portion of the energy contained in the electromagnetic field surrounding the electric transmission lines is captured by each pipeline, resulting in induced AC voltages which vary in magnitude throughout the length of each pipeline. During a fault on any of the transmission lines, energization of the earth by supporting structures near the fault can result in large voltages appearing locally between the earth and the steel wall of any nearby pipeline. Some form of mitigation is usually required to reduce these voltages to acceptable levels for the protection of personnel and of the pipeline itself. This paper presents a new mitigation design approach which not only reduces AC voltages effectively and economically, but also provides cathodic protection for the protected pipeline. Performance of this new mitigation method is illustrated with results from computer simulations, which show how important it is to have an accurate electrical model of the soil structure in any interference study. Results from large-scale mitigation design studies performed for ANR Pipeline Company and other gas transmission companies are presented. >

...read moreread less

Patent•

CGSI pipeline performance improvement

[...]

Michael F. Sedlar, Dennis D. Hansen, George G. Sanford

22 Feb 1994

TL;DR: In this article, a method for improving CGSI imaging system throughput by selectively decimating rows and/or columns of object image data prior to warp transformation in order to more closely approach a 1:1 compression ratio is presented.

...read moreread less

Abstract: A method for improving CGSI imaging system throughput by selectively decimating rows and/or columns of object image data prior to warp transformation in order to more closely approach a 1:1 compression ratio. The best results are produced on compressed object image data prior to decompression.

...read moreread less

Journal Article•DOI•

A video DSP with a macroblock-level-pipeline and a SIMD type vector-pipeline architecture for MPEG2 CODEC

[...]

M. Toyokura, M. Saishi, S. Kurohmaru, K. Yamauchi, H. Imanishi, T. Ougi, A. Watabe, Y. Matsumoto, T. Morishige, H. Kodama, E. Miyagoshi, K. Okamoto, M. Gion, T. Minemaru, A. Ohtani, T. Araki, Kunitoshi Aono, H. Takeno, T. Akiyama, B. Wilson - Show less +16 more

16 Feb 1994

TL;DR: A video DSP with macroblock-level-pipeline and a SIMD type vector-p Pipeline architecture (VDSP2) has been developed, using 0.5 /spl mu/m triple-layer-metal CMOS technology.

...read moreread less

Abstract: A video DSP with macroblock-level-pipeline and a SIMD type vector-pipeline architecture (VDSP2) has been developed, using 0.5 /spl mu/m triple-layer-metal CMOS technology. This 17.00 mm/spl times/15.00 mm chip consists of 2.5 M transistors, and operates at 100 MHz. The real-time encoder and decoder specified in the MPEG2 main profile at the main level can be realized with two VDSP2's and a motion estimation (ME) unit, and one VDSP2 respectively, at an 80 MHz clock rate, with a total power dissipation of 4.2 W at 3.3 V. >

...read moreread less

Patent•

Process for dividing instructions of a computer program into instruction groups for parallel processing

[...]

Joerg Schepers¹•Institutions (1)

Siemens¹

01 Mar 1994

TL;DR: In this article, a heuristic selection process is used to select the instructions whose precursor instructions have already been processed and investigate these instructions as to whether before their execution a minimum number of delay cycles is necessary.

...read moreread less

Abstract: In order to be able to execute rapid processing of a program on super-scalar microprocessors, the individual instructions of this program must be divided into instruction groups, which can be processed by processing units of the microprocessor, in such a way that the instructions can be processed in parallel. In this case, it is necessary to take account of data-flow dependences and control-flow dependences as well as pipeline conflicts. For this purpose, the first step is to select the instructions whose precursor instructions have already been processed and to investigate these instructions as to whether before their execution a minimum number of delay cycles is necessary, and the instructions are stored with a minimum number in a list. From these instructions, one is selected using a heuristic selection process, and this one is classified into an instruction group in which the instruction can be processed in the earliest possible execution cycle.

...read moreread less

Patent•

Shared line buffer architecture for a video processing circuit

[...]

Scott A. Kimura

04 Mar 1994

TL;DR: In this article, a single line buffer in a motion video card is used for both vertical reduction of the pixel image before storage in a video memory buffer and vertical expansion after being outputted by the video buffer.

...read moreread less

Abstract: A single line buffer in a motion video card is used for both vertical reduction of the pixel image before storage in a video memory buffer and vertical expansion of the pixel image after being outputted by the video memory buffer. When the desired display size is smaller than the original pixel image size, then the line buffer is used by the input pipeline to reduce the image. When the desired display size is larger than the original pixel image size (or larger than the image stored in the memory buffer), then the line buffer is used by the output pipeline to enlarge the image.

...read moreread less

Patent•

Pipelined data processing including instruction trace

[...]

Gary L. Swoboda¹, Mark R. Hammes¹, Douglas E. Deao¹•Institutions (1)

Texas Instruments¹

11 Jan 1994

TL;DR: In this paper, an address pipeline is provided to hold the addresses of the instructions presently in the instruction pipeline, which facilitates tracing only executed instructions, and permits stopping the data processor during a branch delay slot without losing the branch information.

...read moreread less

Abstract: In a pipelined data processor (11), an address pipeline (39, 41) is provided to hold the addresses of the instructions presently in the instruction pipeline (23, 25) The address pipeline facilitates tracing only executed instructions, and permits stopping the data processor during a branch delay slot without losing the branch information

...read moreread less

Journal Article•DOI•

A VLSI fuzzy inference processor based on a discrete analog approach

[...]

Vincenzo Catania, Antonio Puliafito, M. Russo, Lorenzo Vita

01 May 1994-IEEE Transactions on Fuzzy Systems

TL;DR: A general-purpose fuzzy processor, the core of which is based on an analog-numerical approach combining the inherent advantages of analog and digital implementations, above all as regards noise margins is presented.

...read moreread less

Abstract: In this paper we present a design for a general-purpose fuzzy processor, the core of which is based on an analog-numerical approach combining the inherent advantages of analog and digital implementations, above all as regards noise margins. The architectural model proposed was chosen in such a way as to obtain a processor capable of working with a considerable degree of parallelism. The internal structure of the processor is organized as a cascade of pipeline stages which perform parallel execution of the processes into which each inference can be decomposed. A particular feature of the project is the definition of a 'fuzzy-gate', which executes elementary fuzzy computations, on which construction of the whole core of the processor is based. Designed using CMOS technology, the core can be integrated into a single chip and can easily be extended. The performance obtainable, in the order of 50 Mega fuzzy rules per second, is of a considerable level. >

...read moreread less

Patent•

Pipelined memory having synchronous and asynchronous operating modes

[...]

Lawrence F. Childs¹, Kenneth W. Jones², Stephen T. Flannagan¹, Ray Chang¹•Institutions (2)

Motorola¹, Freescale Semiconductor²

08 Mar 1994

TL;DR: A pipelined memory (20) as mentioned in this paper includes output registers (34) and output enable registers (48) which are used to electrically switch between the asynchronous operating mode and the synchronous operating mode.

...read moreread less

Abstract: A pipelined memory (20) has a synchronous operating mode and an asynchronous operating mode. The memory (20) includes output registers (34) and output enable registers (48) which are used to electrically switch between the asynchronous operating mode and the synchronous operating mode. In addition, in the synchronous operating mode, the depth of pipelining can be changed between a three stage pipeline and a two stage pipeline. By changing the depth of pipelining, the memory (20) can operate using a greater range of clock frequencies. In addition, the operating frequency can be changed to facilitate testing and debugging of the memory (20).

...read moreread less

Journal Article•DOI•

Timing constraints for wave-pipelined systems

[...]

C.T. Gray¹, Wentai Liu², Ralph K. Cavin²•Institutions (2)

Research Triangle Park¹, North Carolina State University²

01 Aug 1994-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A timing constraint formulation for the correct clocking of wave-pipelined systems and implications and motivations for the use of accurate delay models and exact timing analysis in the determination of combinational logic delays are given.

...read moreread less

Abstract: Wave-pipelining is a timing methodology used in digital systems to achieve maximal rate operation. Using this technique, new data are applied to the inputs of a combinational block before the previous outputs are available, thus effectively pipelining the combinational logic and maximizing the utilization of the logic without inserting registers. This paper presents a timing constraint formulation for the correct clocking of wave-pipelined systems. Both single- and multiple-stage systems including feedback are considered. Based on the formulation of this paper, several important new results are presented relating to performance limits of wave-pipelined circuits. These results include the specification of distinct and disjoint regions of valid operation dependent on the clock period, intentional clock skew, and the global clock latency. Also, implications and motivations for the use of accurate delay models and exact timing analysis in the determination of combinational logic delays are given, and an analogous relationship between the multi-stage system and the single-stage system in terms of performance limits is shown. The minimum clock period is obtained by clock skew optimization formulated as a linear program. In addition, important special cases are examined and their relative performance limits are analyzed. >

...read moreread less

Patent•

Fast multiply-add instruction sequence in a pipeline floating-point processor

[...]

Son Dao-Trong¹, Juergen Haas¹, Rolf Mueller¹•Institutions (1)

IBM¹

20 Jul 1994

TL;DR: In this article, a pipeline floating point processor is proposed, where the addition pipelining is reorganized so that no wait cycle is needed when the addition uses the result of an immediately foregoing multiplication (fast multiply-add instruction).

...read moreread less

Abstract: A pipeline floating point processor in which the addition pipelining is reorganized so that no wait cycle is needed when the addition uses the result of an immediately foregoing multiplication (fast multiply-add instruction). The reorganization implies the following changes of an existing data flow of the pipeline floating processor: data feed-back via path ND of normalized data from the multiplier M into the aligners AL1 and AL2; shift left one digit feature on both sides of the data path for taking account of a possible leading zero digit of the product, and special zeroing of potential guard digits by Z1 and Z2; exponent build by 9 bits for overflow and underflow recognition, and due to an underflow the exponent result, is reset to zero on the fly by a true zero unit (T/C).

...read moreread less

Journal Article•DOI•

Ultrafast pipelined arithmetic using quantum electronic devices

[...]

S. Mohan¹, Pinaki Mazumder¹, George I. Haddad¹•Institutions (1)

University of Michigan¹

01 Mar 1994

TL;DR: Simulation studies show that application of pipelining techniques can provide an effective throughput of one 32-bit addition every 1.6 ns using minimal hardware.

...read moreread less

Abstract: Negative differential resistance characteristics of several new quantum electronic devices have been used to design high-speed logic gates with the latching property. These latching gates form the basis of the ultrafast pipelined adder circuit described in this paper. The latching or memory feature of these circuits, which was previously considered to be a nuisance in the design of combinational circuits, is exploited to overcome the pipeline overheads of area and time. Simulation studies show that application of pipelining techniques can provide an effective throughput of one 32-bit addition every 1.6 ns using minimal hardware.

...read moreread less

Patent•

Picture processing system

[...]

Alain Artieri¹•Institutions (1)

STMicroelectronics¹

24 May 1994

TL;DR: In this article, a system that processes compressed data arriving in packets corresponding to picture blocks, the packets being separated by headers containing decoding parameters of the packets, is described, where a memory bus is controlled by a memory controller to exchange data between the processing elements and a picture memory.

...read moreread less

Abstract: A system that processes compressed data arriving in packets corresponding to picture blocks, the packets being separated by headers containing decoding parameters of the packets. A memory bus is controlled by a memory controller to exchange data between the processing elements and a picture memory. A pipeline circuit contains a plurality of processing elements. A parameter bus provides packets to be processed to the pipeline circuit, as well as the decoding parameters to elements of the system. The parameter bus is controlled by a variable length decoder that receives the compressed data from the memory bus and that extracts the packets and the decoding parameters therefrom.

...read moreread less

Patent•

Pipeline processing device, clipping processing device, three-dimensional simulator device and pipeline processing method

[...]

Masaki Takeda

28 Sep 1994

TL;DR: In this paper, a pipeline processing device that enables the implementation of optimized pipeline processing and moreover has a simple configuration and control method, and a clipping processing device, that uses this pipeline processing devices.

...read moreread less

Abstract: An objective of this invention is to provide a pipeline processing device that enables the implementation of optimized pipeline processing and moreover has a simple configuration and control method, and a clipping processing device that uses this pipeline processing device. Data is sequentially transferred to pipeline register sections (500 to 506), but only when there is processing data in each previous stage, and given data processing is performed in data processing sections (520 to 524). After the end of input of processing data in which a plurality of data items is formed into one string D 0:3!, this data is automatically extracted from the pipeline register sections (500 to 506). These transfer and automatic extraction operations in the pipeline control sections (530 to 536) are controlled by an LD signal. This LD signal is formed by ENIN and FLASHIN signals.

...read moreread less

Journal Article•DOI•

Compiling for distributed memory architectures

[...]

Anne Rogers¹, Keshav Pingali•Institutions (1)

Princeton University¹

01 Mar 1994-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A parallelizing compiler that, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference and several message optimizations that address the issues of overhead and synchronization in message transmission.

...read moreread less

Abstract: The lack of high-level languages and good compilers for parallel machines hinders their widespread acceptance and use. Programmers must address issues such as process decomposition, synchronization, and load balancing. We have developed a parallelizing compiler that, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference. A process decomposition is obtained by specializing the program for each processor to the data that resides on that processor. If this analysis fails, the compiler falls back to a simple but inefficient scheme called run-time resolution. Each process's role in the computation is determined by examining the data required for execution at run-time. Thus, our approach to process decomposition is data-driven rather than program-driven. We discuss several message optimizations that address the issues of overhead and synchronization in message transmission. Accumulation reorganizes the computation of a commutative and associative operator to reduce message traffic. Pipelining sends a value as close to its computation as possible to increase parallelism. Vectorization of messages combines messages with the same source and the same destination to reduce overhead. Our results from experiments in parallelizing SIMPLE, a large hydrodynamics benchmark, for the Intel iPSC/2, show a speedup within 60% to 70% of handwritten code. >

...read moreread less

Proceedings Article•DOI•

TACITE: A Transient Tool for Multiphase Pipeline and Well Simulation

[...]

C.L. Pauchon, Hasmuekh Dhulesia

01 Jan 1994-Software - Practice and Experience

Patent•

Adaptive video signal processing apparatus

[...]

Eiji Iwata¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

30 Mar 1994

TL;DR: In this article, a processing apparatus which adaptively performs image compensation and encoding/expansion and decoding processing such as discrete cosine transformation (DCT), inner product computation, image data addition, and image data differential processing, etc.

...read moreread less

Abstract: A processing apparatus which adaptively performs image compensation and encoding/expansion and decoding processing such as discrete cosine transformation (DCT)/inverse discrete cosine transformation (IDCT), inner product computation, image data addition, and image data differential processing, etc. for blocks of image data of a size of m×n, which is provided with (a) plurality of parallel processing units 1 to 4 each of which performs addition, subtraction, various types of logical computations, comparison of magnitude, computation of absolute values of differences, and butterfly addition·subtraction processing, performs multiplication, and performs accumulation; (b) mutually connected pipeline memories 5 to 7 which are disposed so as to connect adjoining processing units among these processing units; and (c) data selectors 41 to 44 which selectively apply input data to the processing units 1 to 4, wherein adjoining processing units are coupled via the mutually connected pipeline memories and wherein an internal pipeline memory in the aforesaid processing unit is selected to constitute a predetermined data flow path, thereby to perform DCT or other desired video signal processing.

...read moreread less

Patent•

Temporary pipeline register file for a superpipelined superscalar processor

[...]

Robert Yung¹, William N. Joy¹, Marc Tremblay¹•Institutions (1)

Sun Microsystems¹

19 Oct 1994

TL;DR: In this article, a processor has an execution pipeline, a register file and a controller, and the controller makes the first result stored in the register file available in the event that the first results are needed for the execution of a subsequent instruction.

...read moreread less

Abstract: A processor method and apparatus. The processor has an execution pipeline, a register file and a controller. The execution pipeline is for executing an instruction and has a first stage for generating a first result and a last stage for generating a final result. The register file is for storing the first result and the final result. The controller makes the first result stored in the register file available in the event that the first result is needed for the execution of a subsequent instruction. By storing the result of the first stage in the register file, the length of the execution pipeline is reduced from that of the prior art. Furthermore, logic required for providing inputs to the execution pipeline is greatly simplified over that required by the prior art.

...read moreread less

Collapse