scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 1998"


Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, the authors describe new bistable logic families using resonant tunneling diodes (RTD's) in conjunction with high-performance III-V devices such as heterojunction bipolar transistors (HBT's) and modulation doped field effect transistors(MODFET's) for binary and multiple-valued logic.
Abstract: Many semiconductor quantum devices utilize a novel tunneling transport mechanism that allows picosecond device switching speeds The negative differential resistance characteristic of these devices, achieved due to resonant tunneling, is also ideally suited for the design of highly compact, self-latching logic circuits As a result, quantum device technology is a promising emerging alternative for high-performance very-large-scale-integration design The bistable nature of the basic logic gates implemented using resonant tunneling devices has been utilized in the development of a gate-level pipelining technique, called nanopipelining, that significantly improves the throughput and speed of pipelined systems The advent of multiple-peak resonant tunneling diodes provides a viable means for efficient design of multiple-valued circuits with decreased interconnect complexity and reduced device count as compared to multiple-valued circuits in conventional technologies This paper details various circuit design accomplishments in the area of binary and multiple-valued logic using resonant tunneling diodes (RTD's) in conjunction with high-performance III-V devices such as heterojunction bipolar transistors (HBT's) and modulation doped field-effect transistors (MODFET's) New bistable logic families using RTD+HBT and RTD+MODFET gates are described that provide a single-gate, self-latching majority function in addition to basic NAND, NOR, and inverter gates

477 citations


Journal ArticleDOI
TL;DR: Heuristics with good performance bounds can be derived for combinational circuits tested using built-in self-test (BIST) and considerable reduction in power dissipation can be obtained using the proposed techniques.
Abstract: Reduction of power dissipation during test application is studied for scan designs and for combinational circuits tested using built-in self-test (BIST). The problems are shown to be intractable. Heuristics to solve these problems are discussed. We show that heuristics with good performance bounds can be derived for combinational circuits tested using BIST. Experimental results show that considerable reduction in power dissipation can be obtained using the proposed techniques.

338 citations


Proceedings ArticleDOI
29 Sep 1998
TL;DR: By exploiting the spatial regularity of the new algorithm, the requirement for both dominant elements in VLSI implementation, the memory size and the number of complex multipliers, have been minimized and the area/power efficiency has been enhanced.
Abstract: The FFT processor is one of the key components in the implementation of wideband OFDM systems. Architectures with a structured pipeline have been used to meet the fast, real-time processing demand and low-power consumption requirement in a mobile environment. Architectures based on new forms of FFT, the radix-2/sup i/ algorithm derived by cascade decomposition, is proposed. By exploiting the spatial regularity of the new algorithm, the requirement for both dominant elements in VLSI implementation, the memory size and the number of complex multipliers, have been minimized. Progressive wordlength adjustment has been introduced to optimize the total memory size with a given signal-to-quantization-noise-ratio (SQNR) requirement in fixed-point processing. A new complex multiplier based on distributed arithmetic further enhanced the area/power efficiency of the design. A single-chip processor for 1 K complex point FFT transform is used to demonstrate the design issues under consideration.

322 citations


Proceedings ArticleDOI
01 May 1998
TL;DR: To achieve a non-iterative design flow, it is proposed that early synthesis stages should use "wireplanning" to distribute delays over the functional elements and interconnect, and layout synthesis should use its degrees of freedom to realize those delays.
Abstract: A shift is proposed in the design of VLSI circuits. In conventional design, higher levels of synthesis produce a netlist, from which layout synthesis builds a mask specification for manufacturing. Timing analysis is built into a feedback loop to detect timing violations which are then used to update specifications to synthesis. Such iteration is undesirable, and for very high performance designs, infeasible. The problem is likely to become much worse with future generations of technology. To achieve a non-iterative design flow, we propose that early synthesis stages should use "wireplanning" to distribute delays over the functional elements and interconnect, and layout synthesis should use its degrees of freedom to realize those delays. In this paper we attempt to quantify this problem for future technologies and propose some solutions for a "constant delay" methodology.

300 citations


Proceedings ArticleDOI
01 Nov 1998
TL;DR: The Imagine architecture supports the stream programming model by providing a bandwidth hierarchy tailored to the demands of media applications by reducing the global register and memory bandwidth required by typical applications by factors of 13 and 21 respectively.
Abstract: Media applications are characterized by large amounts of available parallelism, little data reuse, and a high computation to memory access ratio. While these characteristics are poorly matched to conventional microprocessor architectures, they are a good fit for modern VLSI technology with its high arithmetic capacity but limited global bandwidth. The stream programming model, in which an application is coded as streams of data records passing through computation kernels, exposes both parallelism and locality in media applications that can be exploited by VLSI architectures. The Imagine architecture supports the stream programming model by providing a bandwidth hierarchy tailored to the demands of media applications. Compared to a conventional scalar processor. Imagine reduces the global register and memory bandwidth required by typical applications by factors of 13 and 21 respectively. This bandwidth efficiency enables a single chip Imagine processor to achieve a peak performance of 16.2GFLOPS (single-precision floating point) and sustained performance of up to 8.5GFLOPS on media processing kernels.

280 citations


Proceedings ArticleDOI
01 Nov 1998
TL;DR: A fast and exact algorithm which can minimize total area subject to maximum delay bound and is based on Lagrangian relaxation and "one-gate/wire-at-a-time" local optimizations, and is extremely economical and fast.
Abstract: This paper considers simultaneous gate and wire sizing for general very large scale integrated (VLSI) circuits under the Elmore delay model. We present a fast and exact algorithm which can minimize total area subject to maximum delay bound. The algorithm can be easily modified to give exact algorithms for optimizing several other objectives (e.g., minimizing maximum delay or minimizing total area subject to arrival time specifications at all inputs and outputs). No previous algorithm for simultaneous gate and wire sizing can guarantee exact solutions for general circuits. Our algorithm is an iterative one with a guarantee on convergence to global optimal solutions. It is based on Lagrangian relaxation and "one-gate/wire-at-a-time" greedy optimizations, and is extremely economical and fast. For example, we can optimize a circuit with 27648 gates and wires in 11.53 min using under 23 Mbytes memory on a PC with a 333-MHz Pentium II processor.

255 citations


Proceedings ArticleDOI
11 May 1998
TL;DR: By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor.
Abstract: The design and implementation of a 1024-point pipeline FFT processor is presented. The architecture is based on a new form of FFT, the radix-2/sup 2/ algorithm. By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor. The chip has been implement in 0.5 /spl mu/m CMOS technology and takes an area of 40 mm/sup 2/. With 3.3 V power supply, it can compute 2/sup n/, n=0, 1, ..., 10 complex point forward and inverse FFT in real time with up to 30 MHz sampling frequency. The SQNR is above 50 dB for white noise input.

243 citations


Journal ArticleDOI
01 Sep 1998
TL;DR: A detailed survey of yield-enhancement techniques for very large-scale-integration (VLSI) circuits can be found in this article, where the authors provide a detailed survey and illustrate their use by describing the design of several representative defect-tolerant VLSI circuits.
Abstract: Current very-large-scale-integration (VLSI) technology allows the manufacture of large-area integrated circuits with submicrometer feature sizes, enabling designs with several millions of devices. However, imperfections in the fabrication process result in yield-reducing manufacturing defects, whose severity grows proportionally with the size and density of the chip. Consequently, the development and use of yield-enhancement techniques at the design stage, to complement existing efforts at the manufacturing stage, is economically justifiable. Design-stage yield-enhancement techniques are aimed at making the integrated circuit "defect tolerant", i.e., less sensitive to manufacturing defects. They include incorporating redundancy into the design, modifying the circuit floorplan, and modifying its layout. Successful designs of defect-tolerant chips must rely on accurate yield projections. This paper reviews the currently used statistical yield-prediction models and their application to defect-tolerant designs. We then provide a detailed survey of various yield-enhancement techniques and illustrate their use by describing the design of several representative defect-tolerant VLSI circuits.

236 citations


Journal ArticleDOI
TL;DR: This paper presents a tutorial of the principles of wave-pipelining and a survey ofWave- pipelined VLSI chips and CAD tools for the synthesis and analysis of wave -pipelined circuits.
Abstract: Wave-pipelining is a method of high-performance circuit design which implements pipelining in logic without the use of intermediate latches or registers. The combination of high-performance integrated circuit (IC) technologies, pipelined architectures, and sophisticated computer-aided design (CAD) tools has converted wave-pipelining from a theoretical oddity into a realistic, although challenging, VLSI design method. This paper presents a tutorial of the principles of wave-pipelining and a survey of wave-pipelined VLSI chips and CAD tools for the synthesis and analysis of wave-pipelined circuits.

235 citations


Journal ArticleDOI
TL;DR: An overview of a comprehensive collection of on-line testing techniques for VLSI, avoiding complex fail-safe interfaces using discrete components; radiation hardened designs, avoiding expensive fabrication process such as SOI, etc.
Abstract: This paper presents an overview of a comprehensive collection of on-line testing techniques for VLSI. Such techniques are for instance: self-checking design, allowing high quality concurrent checking by means of hardware cost drastically lower than duplication; signature monitoring, allowing low cost concurrent error detection for FSMs; on-line monitoring of reliability relevant parameters such as current, temperature, abnormal delay, signal activity during steady state, radiation dose, clock waveforms, etc.; exploitation of standard BIST, or implementation of BIST techniques specific to on-line testing (Transparent BIST, Built-In Concurrent Self-Test,...); exploitation of scan paths to transfer internal states for performing various tasks for on-line testing or fault tolerance; fail-safe techniques for VLSI, avoiding complex fail-safe interfaces using discrete components; radiation hardened designs, avoiding expensive fabrication process such as SOI, etc.

234 citations


Book
01 Jan 1998
TL;DR: One of the first books on the subject, this guide covers all stages of design and focuses on the algorithms which are the building blocks of the design automation software which generates the layout of VLSI circuits.
Abstract: From the Publisher: Modern microprocessors such as Intel's Pentium chip typically contain millions of transitors. Known generically as Very Large-Scale Integrated (VLSI) systems, the chips have a scale and complexity that has necessitated the development of CAD tools to automate their design. This book focuses on the algorithms which are the building blocks of the design automation software which generates the layout of VLSI circuits. One of the first books on the subject, this guide covers all stages of design.

Book
01 Jan 1998
TL;DR: This book provides an introduction to the foundations of this interdisciplinary research area, emphasizing its applications in computer aided circuit design.
Abstract: From the Publisher: This book provides an introduction to the foundations of this interdisciplinary research area, emphasizing its applications in computer aided circuit design

Journal ArticleDOI
H.H. Chen1, J.S. Neely1
TL;DR: This integrated chip-and-package model provides a complete analysis of the resistive IR drop, inductive delta-I noise, and the on-chip Vdd distribution and allows designers to identify the hot spots on the chip and optimize design variables to minimize the noise.
Abstract: This paper describes the interconnect and circuit modeling techniques to analyze the on-chip power supply noise for high-performance very large scale integration (VLSI) design. To reduce the complexity of full-chip analysis, a hierarchical power supply distribution model, which consists of a 12/spl times/12 package model, a 50/spl times/50 on-chip power bus model, and a distributed switching circuit model, is developed. This integrated chip-and-package model provides a complete analysis of the resistive IR drop, inductive delta-I noise, and the on-chip Vdd distribution. It also allows designers to identify the hot spots on the chip and optimize design variables to minimize the noise. Analysis results of our benchmark microprocessor chips will be presented to demonstrate the various applications of this methodology.

Journal ArticleDOI
16 Aug 1998
TL;DR: The sum-product algorithm (belief/probability propagation) can be naturally mapped into analog transistor circuits, which enable the construction of analog-VLSI decoders for turbo codes, low-density parity-check codes, and similar codes.
Abstract: The sum-product algorithm (belief/probability propagation) can be naturally mapped into analog transistor circuits. These circuits enable the construction of analog-VLSI decoders for turbo codes, low-density parity-check codes, and similar codes.

Book ChapterDOI
01 Sep 1998
TL;DR: This work has shown that multiplexing is an effective way of leveraging the 5 order-of-magnitude difference in bandwidth between a neuron and a digital bus, enabling us to replace dedicated point-to-point connections among thousands of neurons with a handful of high-speed connections and thousands of switches.
Abstract: The small number of input-output connections available with standard chip-packaging technology, and the small number of routing layers available in VLSI technology, place severe limitations on the degree of intra- and interchip connectivity that can be realized in multichip neuromorphic systems. Inspired by the success of time-division multiplexing in communications [16] and computer networks [19], many researchers have adopted multiplexing to solve the connectivity problem [12, 67, 17]. Multiplexing is an effective way of leveraging the 5 order-of-magnitude difference in bandwidth between a neuron (hundreds of Hz) and a digital bus (tens of megahertz), enabling us to replace dedicated point-to-point connections among thousands of neurons with a handful of high-speed connections and thousands of switches (transistors). This approach pays off in VLSI technology because transistors take up a lot less area than wires, and are becoming relatively more and more compact as the fabrication process scales down to deep submicron feature sizes.

Journal ArticleDOI
TL;DR: The results suggest that OE-VLSI integration offers substantial power and speed improvements even when relatively small numbers of photonic devices are driven with commodity complementary metal-oxide-semiconductor logic technologies.
Abstract: Optoelectronic-VLSI (OE-VLSI) technology represents the intimate integration of photonic devices with silicon VLSI electronics. We review the motivations and status of emerging OE-VLSI technologies and examine the performance of OE-VLSI technology versus conventional wire-bonded OE packaging. The results suggest that OE-VLSI integration offers substantial power and speed improvements even when relatively small numbers of photonic devices are driven with commodity complementary metal-oxide-semiconductor logic technologies.

Proceedings ArticleDOI
23 Feb 1998
TL;DR: This paper presents innovative encoding techniques suitable for minimizing the switching activity of system-level address buses, and targets the reduction of the average number of bus line transitions per clock cycle.
Abstract: The power dissipated by system-level buses is the largest contribution to the global power of complex VLSI circuits. Therefore, the minimization of the switching activity at the I/O interfaces can provide significant savings on the overall power budget. This paper presents innovative encoding techniques suitable for minimizing the switching activity of system-level address buses. In particular, the schemes illustrated here target the reduction of the average number of bus line transitions per clock cycle. Experimental results, conducted on address streams generated by a real microprocessor, have demonstrated the effectiveness of the proposed methods.

01 Jan 1998
TL;DR: This thesis presents the design, implementation, and evaluation of T0 (Torrent-0): the first single-chip vector microprocessor, a compact but highly parallel processor that can sustain over 24 operations per cycle while issuing only a single 32-bit instruction per cycle.
Abstract: Most previous research into vector architectures has concentrated on supercomputing applications and small enhancements to existing vector supercomputer implementations. This thesis expands the body of vector research by examining designs appropriate for single-chip full-custom vector microprocessor implementations targeting a much broader range of applications. I present the design, implementation, and evaluation of T0 (Torrent-0): the first single-chip vector microprocessor. T0 is a compact but highly parallel processor that can sustain over 24 operations per cycle while issuing only a single 32-bit instruction per cycle. T0 demonstrates that vector architectures are well suited to full-custom VLSI implementation and that they perform well on many multimedia and human-machine interface tasks. The remainder of the thesis contains proposals for future vector microprocessor designs. I show that the most area-efficient vector register file designs have several banks with several ports, rather than many banks with few ports as used by traditional vector supercomputers, or one bank with many ports as used by superscalar microprocessors. To extend the range of vector processing, I propose a vector flag processing model which enables speculative vectorization of "while" loops. To improve the performance of inexpensive vector memory systems, I introduce virtual processor caches, a new form of primary vector cache which can convert some forms of strided and indexed vector accesses into unit-stride bursts.

Journal ArticleDOI
TL;DR: A new chip-level electrothermal timing simulator for CMOS VLSI circuits is presented, and temperature-dependent reliability and timing problems of VLSi circuits can be accurately identified.
Abstract: In this paper, we present a new chip-level electrothermal timing simulator for CMOS VLSI circuits. Given the chip layout, the packaging specification, and the periodic input signal pattern, it finds the on-chip steady-state temperature profile and the resulting circuit performance. A tester chip has been designed for verification of ILLIADS-T, and very good agreement between simulation and experiment was found. Using this electrothermal simulator, temperature-dependent reliability and timing problems of VLSI circuits can be accurately identified.

Journal ArticleDOI
01 Apr 1998
TL;DR: In this article, the recent advances of silicon-on-insulator (SOI) technology for complementary metal-oxide-semiconductor (CMOS) very large-scaleintegration memory and logic applications are considered.
Abstract: This paper reviews the recent advances of silicon-on-insulator (SOI) technology for complementary metal-oxide-semiconductor (CMOS) very-large-scale-integration memory and logic applications. Static random access memories (SRAMs), dynamic random access memories (DRAMs), and digital CMOS logic circuits are considered. Particular emphases are placed on the design issues and advantages resulting from the unique SOI device structure. The impact of floating-body in partially depleted devices on the circuit operation, stability, and functionality are addressed. The use of smart-body contact to improve the power and delay performance is discussed, as are global design issues.

Proceedings ArticleDOI
Byron L. Krauter1, Sharad Mehrotra1
01 May 1998
TL;DR: This work proposes a rules-based method that efficiently and accurately captures the high and low frequency characteristics directly from layout shapes, and subsequently synthesizes a simple frequency independent ladder circuit suitable for timing analysis.
Abstract: It is well understood that frequency independent lumped-element circuits can be used to accurately model proximity and skin effects in transmission lines [7]. Furthermore, it is also understood that these circuits can be synthesized knowing only the high and the low frequency resistances and inductances [4]. Existing VLSI extraction tools however, are not efficient enough to solve for the frequency dependent resistances and inductances on large VLSI layouts, nor do they synthesize circuits suitable for timing analysis. We propose a rules-based method that efficiently and accurately captures the high and low frequency characteristics directly from layout shapes, and subsequently synthesizes a simple frequency independent ladder circuit suitable for timing analysis. We compare our results to other simulation results.

Journal ArticleDOI
TL;DR: Augmenting the estimation technique to a conventional systolic-architecture-based VLSI motion estimation reduces the power consumption by a factor of 2, while still preserving the optimal solution and the throughput.
Abstract: This paper presents an architectural enhancement to reduce the power consumption of the full-search block-matching (FSBM) motion estimation. Our approach is based on eliminating unnecessary computation using conservative approximation. Augmenting the estimation technique to a conventional systolic-architecture-based VLSI motion estimation reduces the power consumption by a factor of 2, while still preserving the optimal solution and the throughput. A register-transfer level implementation as well as simulation results on benchmark video clips are presented.

Journal ArticleDOI
TL;DR: Compared to other techniques for fault tolerance in FPGAs, these methods are shown to provide significantly greater yield improvement, and a 35 percent non-FT chip yield for a 16/spl times/16 FPGA is more than doubled.
Abstract: The very high levels of integration and submicron device sizes used in current and emerging VLSI technologies for FPGAs lead to higher occurrences of defects and operational faults. Thus, there is a critical need for fault tolerance and reconfiguration techniques for FPGAs to increase chip yields (with factory reconfiguration) and/or system reliability (with field reconfiguration). We first propose techniques utilizing the principle of node-covering to tolerate logic or cell faults in SRAM-based FPGAs. A routing discipline is developed that allows each cell to cover-to be able to replace-its neighbor in a row. Techniques are also proposed for tolerating wiring faults by means of replacement with spare portions. The replaceable portions can be individual segments, or else sets of segments, called "grids". Fault detection in the FPGAs is accomplished by separate testing, either at the factory or by the user. If reconfiguration around faulty cells and wiring is performed at the factory (with laser-burned fuses, for example), it is completely transparent to the user. In other words, user configuration data loaded into the SRAM remains the same, independent of whether the chip is detect-free or whether it has been reconfigured around defective cells or wiring-a major advantage for hardware vendors who design and sell FPGA-based logic (e.g., glue logic in microcontrollers, video cards, DSP cards) in production-scale quantities. Compared to other techniques for fault tolerance in FPGAs, our methods are shown to provide significantly greater yield improvement, and a 35 percent non-FT chip yield for a 16/spl times/16 FPGA is more than doubled.

Proceedings ArticleDOI
31 May 1998
TL;DR: The proposed approach is based on a re-ordering of the vectors in the test sequence to minimize the switching activity of the circuit during test application and guarantees a decrease in power consumption and heat dissipation.
Abstract: This paper considers the problem of testing VLSI integrated circuits without exceeding their power ratings during test. The proposed approach is based on a re-ordering of the vectors in the test sequence to minimize the switching activity of the circuit during test application. Our technique uses the Hamming distance between test vectors and guarantees a decrease in power consumption and heat dissipation without modifying the initial fault coverage. Results of experiments are presented at the end of this paper and shows a reduction of the circuit activity in the range from 8.2 to 54.1% during test application.

Journal ArticleDOI
TL;DR: In this paper, the authors prove the necessity for extending the system of design rules, propose a thermal design rule, and present an efficient and quantitatively accurate thermal simulator as tool for the design process.
Abstract: In this paper, self-heating of interconnects has been shown to affect the lifetime of next generation integrated circuits significantly more severely than today's. The paper proves the necessity for extending the system of design rules, proposes a thermal design rule, and presents an efficient and quantitatively accurate thermal simulator as tool for the design process.

Proceedings ArticleDOI
13 Sep 1998
TL;DR: A new VLSI-oriented fast Fourier transform (FFT) algorithm-radix-2/4/8, which can effectively minimize the number of complex multiplications and is designed for use in the DVB application in 0.3 V triple-metal CMOS process.
Abstract: In this paper, we present a new VLSI-oriented fast Fourier transform (FFT) algorithm-radix-2/4/8, which can effectively minimize the number of complex multiplications. This algorithm can be implemented efficiently using a pipelined architecture. Based on this pipelined architecture, an 8 K FFT ASIC is designed for use in the DVB (Digital Video Broadcasting) application in 0.6 /spl mu/m-3.3 V triple-metal CMOS process.

Proceedings ArticleDOI
Phillip J. Restle1, Alina Deutsch1
11 Jun 1998
TL;DR: Novel modeling and measurement techniques are used to investigate on-chip transmission-line effects that are important for high performance clock distribution networks.
Abstract: Clock distribution has become an increasingly challenging problem for VLSI designs, consuming an increasing fraction of resources such as wiring, power, and design time. Unwanted differences or uncertainties in clock network delays degrade performance or cause functional errors. Three dramatically different strategies being used in the VLSI industry to address these challenges are compared. Novel modeling and measurement techniques are used to investigate on-chip transmission-line effects that are important for high performance clock distribution networks.

Book ChapterDOI
23 Sep 1998
TL;DR: After defining an ‘operational envelope’ of robustness, the feasibility of performing fitness evaluations in widely varying physical conditions in order to provide a selection-pressure for robustness is demonstrated and preliminary experimental results are encouraging.
Abstract: ‘Unconstrained intrinsic hardware evolution’ allows an evolutionary algorithm freedom to find the forms and processes natural to a reconfigurable VLSI medium. It has been shown to produce highly unconventional but extremely compact FPGA configurations for simple tasks, but these circuits are usually not robust enough to be useful: they malfunction if used on a slightly different FPGA, or at a different temperature. After defining an ‘operational envelope’ of robustness, the feasibility of performing fitness evaluations in widely varying physical conditions in order to provide a selection-pressure for robustness is demonstrated. Preliminary experimental results are encouraging.

Proceedings ArticleDOI
09 Jan 1998
TL;DR: It is shown, that the average MB complexity per arbitrary shaped P-VOP depicts significant variation over time for the encoder and minor variations for the decoder.
Abstract: A complexity analysis of the video part of the emerging ISO/IEC MPEG-4 standard was performed as a basis for HW/SW partitioning for VLSI implementation of a portable MPEG-4 terminal. While the computational complexity of previously standardized video coding schemes was predictable for I-, P- and B-frames over time, the support of arbitrarily shaped visual objects as well as various coding options within MPEG-4 introduce now content dependent computational requirements with significant variance. In this paper the result of a time dependent complexity analysis of the encoding and decoding process of a binary shape coded video object (VO) and the comparison with a rectangular shaped VO is given for the complete codec as well as for the single tools of the encoding and decoding process. It is shown, that the average MB complexity per arbitrary shaped P-VOP depicts significant variation over time for the encoder and minor variations for the decoder.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
TL;DR: An algorithm for automatically restructuring the controllers of the data paths in which variable-latency units have been introduced is formulated, and results show an average throughput improvement exceeding 27%, at the price of a modest area increase.
Abstract: This paper introduces a novel optimization paradigm for increasing the throughput of digital systems. The basic idea consists of transforming fixed-latency units into variable-latency ones that run with a faster clock cycle. The transformation is fully automatic and can be used in conjunction with traditional design techniques to improve the overall performance of speed-critical units. In addition, we introduce procedures for reducing the area overhead of the modified units, and we formulate an algorithm for automatically restructuring the controllers of the data paths in which variable-latency units have been introduced. Results, obtained on a large set of benchmark circuits, show an average throughput improvement exceeding 27%, at the price of a modest area increase (less than 8% on average).