scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 2002"


Book
12 Sep 2002
TL;DR: This book systematically takes the reader from basic concepts to advanced topics, establishing both rigor and intuition in the design of high-speed integrated circuits for optical communication systems.
Abstract: Design of Integrated Circuits for Optical Communications deals with the design of high-speed integrated circuits for optical communication systems. Written for both students and practicing engineers, the book systematically takes the reader from basic concepts to advanced topics, establishing both rigor and intuition. The text emphasizes analysis and design in modern VLSI technologies, particularly CMOS, and presents numerous broadband circuit techniques. Leading researcher Behzad Razavi is also the author of Design of Analog CMOS Integrated Circuits. Table of contents 1 Introduction to Optical Communications 2 Basic Concepts 3 Optical Devices 4 Transimpedance Amplifiers 5 Limiting Amplifiers and Output Buffers 6 Oscillator Fundamentals 7 LC Oscillators 8 Phase-Locked Loops 9 Clock and Data Recovery 10 Multiplexers and Laser Drivers

693 citations


Journal ArticleDOI
TL;DR: A performance analysis of 1-bit full-adder cell is presented, after the adder cell is anatomized into smaller modules, and several designs of each of them are developed, prototyped, simulated and analyzed.
Abstract: A performance analysis of 1-bit full-adder cell is presented. The adder cell is anatomized into smaller modules. The modules are studied and evaluated extensively. Several designs of each of them are developed, prototyped, simulated and analyzed. Twenty different 1-bit full-adder cells are constructed (most of them are novel circuits) by connecting combinations of different designs of these modules. Each of these cells exhibits different power consumption, speed, area, and driving capability figures. Two realistic circuit structures that include adder cells are used for simulation. A library of full-adder cells is developed and presented to the circuit designers to pick the full-adder cell that satisfies their specific applications.

454 citations


Journal ArticleDOI
TL;DR: An architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000 using an architecture consisting of two row processors, two column processors, and two memory modules.
Abstract: We propose an architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000. The architecture consists of two row processors, two column processors, and two memory modules. Each processor contains two adders, one multiplier, and one shifter. The precision of the multipliers and adders has been determined using extensive simulation. Each memory module consists of four banks in order to support the high computational bandwidth. The architecture has been designed to generate an output every cycle for the JPEG2000 default filters. The schedules have been generated by hand and the corresponding timings listed. Finally, the architecture has been implemented in behavioral VHDL. The estimated area of the proposed architecture in 0.18-/spl mu/ technology is 2.8 nun square, and the estimated frequency of operation is 200 MHz.

350 citations


Book
15 Nov 2002
TL;DR: This book presents the central concepts required for the creative and successful design of analog VLSI circuits and discusses device physics, linear and nonlinear circuit forms, translinear circuits, photodetectors, floating-gate devices, noise analysis, and process technology.
Abstract: From the Publisher: Neuromorphic engineers work to improve the performance of artificial systems through the development of chips and systems that process information collectively using primarily analog circuits. This book presents the central concepts required for the creative and successful design of analog VLSI circuits. The discussion is weighted toward novel circuits that emulate natural signal processing. Unlike most circuits in commercial or industrial applications, these circuits operate mainly in the subthreshold or weak inversion region. Moreover, their functionality is not limited to linear operations, but also encompasses many interesting nonlinear operations similar to those occurring in natural systems. Topics include device physics, linear and nonlinear circuit forms, translinear circuits, photodetectors, floating-gate devices, noise analysis, and process technology.

291 citations


Proceedings ArticleDOI
C. Constantinescu1
23 Jun 2002
TL;DR: It is concluded that the semiconductor industry is approaching a new stage in the design and manufacturing of VLSI circuits, and Fault-tolerance features, specific to custom designed computers, have to be integrated into commercial-off-the-shelf (COTS) V LSI systems in the future, in order to preserve data integrity and limit the impact of transient and intermittent faults.
Abstract: Advances in semiconductor technology have led to impressive performance gains of VLSI circuits, in general, and microprocessors, in particular. However, smaller transistor and interconnect dimensions, lower power voltages, and higher operating frequencies have contributed to increased rates of occurrence of transient and intermittent faults. We address the impact of deep submicron technology on permanent, transient and intermittent classes of faults, and discuss the main trends in circuit dependability. Two case studies exemplify this analysis. The first one deals with intermittent faults induced by manufacturing residuals. The second case study shows that transients generated by timing violations are capable of silently corrupting data. It is concluded that the semiconductor industry is approaching a new stage in the design and manufacturing of VLSI circuits. Fault-tolerance features, specific to custom designed computers, have to be integrated into commercial-off-the-shelf (COTS) VLSI systems in the future, in order to preserve data integrity and limit the impact of transient and intermittent faults.

195 citations


Proceedings ArticleDOI
04 Mar 2002
TL;DR: This paper proposes a novel methodology to incorporate congestion minimization within logic synthesis, and presents results for industrial circuits that validate the approach.
Abstract: In this era of Deep Sub-Micron (DSM) technologies, the impact of interconnects is becoming increasingly important as it relates to integrated circuit (IC) functionality and performance. In the traditional top-down IC design flow, interconnect effects are first taken into account during logic synthesis by way of wireload models. However, for technologies of 0.25 /spl mu/m and below, the wiring capacitance dominates the gate capacitance and the delay estimation based on fanout and design legacy statistics can be highly inaccurate. In addition, logic block size is no longer dictated solely by total cell area, and is often limited by wiring area resources. For these reasons, wiring congestion is an extremely important design factor, and should be taken into consideration at the earliest possible stages of the design flow. In this paper we propose a novel methodology to incorporate congestion minimization within logic synthesis, and present results for industrial circuits that validate our approach.

177 citations


Proceedings ArticleDOI
10 Jun 2002
TL;DR: Two techniques for efficient gate clustering in MTCMOS circuits by modeling the problem via Bin-Packing and Set-Partitioning techniques, which offer significant reduction in both dynamic and leakage power over previous techniques during the active and standby modes respectively are presented.
Abstract: Reducing power dissipation is one of the most principle subjects in VLSI design today. Scaling causes subthreshold leakage currents to become a large component of total power dissipation. This paper presents two techniques for efficient gate clustering in MTCMOS circuits by modeling the problem via Bin-Packing (BP) and Set-Partitioning (SP) techniques. An automated solution is presented, and both techniques are applied to six benchmarks to verify functionality. Both methodologies offer significant reduction in both dynamic and leakage power over previous techniques during the active and standby modes respectively. Furthermore, the SP technique takes the circuit's routing complexity into consideration which is critical for Deep Sub-Micron (DSM) implementations. Sufficient performance is achieved, while significantly reducing the overall sleep transistors' area. Results obtained indicate that our proposed techniques can achieve on average 90% savings for leakage power and 15% savings for dynamic power.

174 citations


Patent
Jai P. Bansal1
19 Dec 2002
TL;DR: In this article, a gate array core cell is proposed to reduce the overall wiring lengths, parasitic capacitance, and increase the circuit density and performance of gate array ASIC components, but with the advantage of reducing mask cost and processing time by about 50 percent.
Abstract: A very efficient gate array core cell in which the base core cell consists of a group of 6 PMOS transistors and a group of 6 NMOS transistors. It also includes pre-wiring of 2 of the 6 PMOS transistors, with 2 of the 6 NMOS transistors at polysilicon level or at local interconnect level while leaving the remaining PMOS and NMOS transistors as individual transistors to be interconnected during the functional ASIC metallization process. The core cell also has 2 polysilicon or local interconnect wires embedded in it, which can be used to interconnect transistors for logic function implementation. The core cell defined in this invention is highly flexible and has been analyzed to interconnect all types of logic and memory functions needed for ASIC designs. The layout of the transistors, pre-wiring of the strategic transistors at polysilicon level or at local interconnect level, and embedded polysilicon or local interconnect wires reduce the core cell size significantly. This core cell design reduces the overall wiring lengths, parasitic capacitance, which in turn reduce delays, power dissipation and increase ASIC performance and circuit density. Gate array ASIC components designed using this core cell provide circuit density, performance and power dissipation characteristics comparable to the Standard Cell ASICs but with the advantage of reducing the mask cost and processing time by about 50 percent.

171 citations


Book
01 Jan 2002
TL;DR: This presentation discusses communication concepts, receiver Architectures, and Frequency Synthesizer: Loop Filter and System Design, as well as the design of transmitter Architectures and Power Amplifier.
Abstract: Communication Concepts: Circuit Designer Perspective.- Receiver Architectures.- Low Noise Amplifier.- Active Mixer.- Passive Mixer.- Analog-to-Digital Converters.- Frequency Synthesizer: Phase/Frequency Processing Components.- Frequency Synthesizer: Loop Filter and System Design.- Transmitter Architectures and Power Amplifier

164 citations


Journal ArticleDOI
TL;DR: A new high-speed domino circuit, called HS-Domino, which resolves the tradeoff between performance and reliability in conventional CD-domino logic while dissipating low dynamic power with minimal area overhead and extends domino's operation in the deep submicron regime.
Abstract: A new high-speed domino circuit, called HS-Domino has been developed. HS-Domino resolves the tradeoff between performance and reliability in conventional CD-domino logic while dissipating low dynamic power with minimal area overhead. HS-Domino, therefore, extends domino's operation in the deep submicron regime. A multithreshold implementation of HS-Domino is also devised to achieve substantially low leakage values during standby, while maintaining high performance and low power during the active mode. Furthermore, the generic multithreshold scheme is applied to differential cascode voltage switch (DDCVS) logic.

136 citations


Proceedings ArticleDOI
07 Apr 2002
TL;DR: A fast but reliable way to detect routing criticalities in VLSI chips by using a congestion estimator for a dynamic avoidance of routability problems in one single run of the placement algorithm.
Abstract: We present a fast but reliable way to detect routing criticalities in VLSI chips. In addition, we show how this congestion estimation can be incorporated into a partitioning based placement algorithm. Different to previous approaches, we do not rerun parts of the placement algorithm or apply a post-placement optimization, but we use our congestion estimator for a dynamic avoidance of routability problems in one single run of the placement algorithm. Computational experiments on chips with up to 1,300,000 cells are presented: The framework reduces the usage of the most critical routing edges by 9.0% on average, the running time increase for the placement is about 8.7%. However, due to the smaller congestion, the running time of routing tools can be decreased drastically, so the total time for placement and (global) routing is decreased by 47% on average.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: The MIT approach to 3D VLSI integration is based on low-temperature Cu-Cu wafer bonding, where device wafers are bonded in a face-to-back manner, with short vertical vias and Cu- Cu pads as the inter-wafer throughway.
Abstract: The MIT approach to 3D VLSI integration is based on low-temperature Cu-Cu wafer bonding. Device wafers are bonded in a face-to-back manner, with short vertical vias and Cu-Cu pads as the inter-wafer throughway. In our scheme, there are several reliability criteria, which include: (a) structural integrity of the Cu-Cu bond; (b) Cu-Cu contact electrical characteristics; and (c) process flow efficiency and repeatability. In addition, CAD tools are needed to aid in design and layout of 3DICs. This paper discusses recent results in all these areas.

Journal ArticleDOI
TL;DR: Two architectures and VLSI implementations of the AES Proposal, Rijndael, are presented and these alternative architectures are operated both for encryption and decryption process to reduce the required hardware resources and achieve high-speed performance.
Abstract: Two architectures and VLSI implementations of the AES Proposal, Rijndael, are presented in this paper. These alternative architectures are operated both for encryption and decryption process. They reduce the required hardware resources and achieve high-speed performance. Their design philosophy is completely different. The first uses feedback logic and reaches a throughput value equal to 259 Mbit/sec. It performs efficiently in applications with low covered area resources. The second architecture is optimized for high-speed performance using pipelined technique. Its throughput can reach 3.65 Gbit/sec.

Journal ArticleDOI
TL;DR: An analytical expression characterizing the SSN voltage is presented here based on a lumped inductive-resistive-capacitive RLC model and the peak value of the SSn voltage is within 10% as compared to SPICE simulations.
Abstract: Simultaneous switching noise (SSN) has become an important issue in the design of the internal on-chip power distribution networks in current very large scale integration/ultra large scale integration (VLSI/ULSI) circuits. An inductive model is used to characterize the power supply rails when a transient current is generated by simultaneously switching the on-chip registers and logic gates in a synchronous CMOS VLSI/ULSI circuit. An analytical expression characterizing the SSN voltage is presented here based on a lumped inductive-resistive-capacitive RLC model. The peak value of the SSN voltage based on this analytical expression is within 10% as compared to SPICE simulations. Design constraints at both the circuit and layout levels are also discussed based on minimizing the effects of the peak value of the SSN voltage.

Proceedings ArticleDOI
28 Apr 2002
TL;DR: This paper proposes a technique based on the reconfiguration of scan chains to reduce test time and test data volume for Illinois Scan Architecture (ILS) based designs.
Abstract: As the complexity of VLSI circuits is increasing due to the exponential rise in transistor count per chip, testing cost is becoming an important factor in the overall integrated circuit (IC) manufacturing cost. This paper addresses the issue of decreasing test cost by lowering the test data bits and the number of clock cycles required to test a chip. We propose a technique based on the reconfiguration of scan chains to reduce test time and test data volume for Illinois Scan Architecture (ILS) based designs. This technique is presented with details of hardware implementation as well as the test generation and test application procedures. The reduction in test time and test data volume achieved using this technique is quite significant in most circuits.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: Circuit techniques to combat increasing switching and leakage power dissipation, poor leakage tolerance of large-signal cache arrays and register files, and worsening global on-chip interconnect scaling trend, are described.
Abstract: CMOS technology scaling is becoming difficult beyond 70 nm node, raising new design challenges for high-performance and low-power microprocessors. This paper discusses some of the key paradigm shifts required. Circuit techniques to combat (i) increasing switching and leakage power dissipation, (ii) poor leakage tolerance of large-signal cache arrays and register files, and (iii) worsening global on-chip interconnect scaling trend, are described.

Proceedings ArticleDOI
16 Dec 2002
TL;DR: This work presents a novel approach to bitwidth- or precision-analysis for floating-point designs, which involves analysing the dataflow graph representation of a design to see how sensitive the output of a node is to changes in the outputs of other nodes: higher sensitivity requires higher precision and hence more output bits.
Abstract: Automatic bitwidth analysis is a key ingredient for highlevel programming of FPGAs and high-level synthesis of VLSI circuits. The objective is to find the minimal number of bits to represent a value in order to minimise the circuit area and to improve efficiency of the respective arithmetic operations, while satisfying user-defined numerical constraints. We present a novel approach to bitwidth- or precision-analysis for floating-point designs. The approach involves analysing the dataflow graph representation of a design to see how sensitive the output of a node is to changes in the outputs of other nodes: higher sensitivity requires higher precision and hence more output bits. We automate such sensitivity analysis by a mathematical method called automatic differentiation, which involves differentiating variables in a design with respect to other variables. We illustrate our approach by optimising the bitwidth for two examples, a discrete Fourier transform (DFT) implementation and a Finite Impulse Response (FIR) filter implementation.

Journal ArticleDOI
TL;DR: A new parallel semiconductor device simulation using the dynamic load balancing approach based on the adaptive finite volume method with a posteriori error estimation has been developed and successfully implemented on a 16-PC Linux cluster with a message passing interface library.
Abstract: We present a new parallel semiconductor device simulation using the dynamic load balancing approach. This semiconductor device simulation based on the adaptive finite volume method with a posteriori error estimation has been developed and successfully implemented on a 16-PC Linux cluster with a message passing interface library. A constructive monotone iterative technique is also applied for solution of the system of nonlinear algebraic equations. Two different parallel versions of the algorithm to perform a complete device simulation are proposed. The first is a dynamic parallel domain decomposition approach, and the second is a parallel current-voltage characteristic points simulation. This implementation shows that a well-designed load balancing simulation can significantly reduce the execution time up to an order of magnitude. Compared with the measured data, numerical results on various submicron VLSI devices are presented, to show the accuracy and efficiency of the method.

Patent
Jeff Solomon1
10 Dec 2002
TL;DR: In this article, a VLSI layout editor and a method for using same that increases display and re-display speed and accuracy uses properties inherent to VLS I layouts that allow them to be displayed efficiently and accurately independent of the canonical expression of the design.
Abstract: A VLSI layout editor and method for using same that increases display and re-display speed and accuracy uses properties inherent to VLSI layouts that allows them to be displayed efficiently and accurately independent of the canonical expression of the VLSI design. The VLSI layout editor and methods for using same use precomputed images that each represent a portion of the VLSI layout, a hierarchy cache that includes multiple LOD versions of selected sub-designs in the pre-computed images, and selected direct determination of the viewable representation from the canonical expression for at least one LOD. Apparatus and methods according to the present invention can render a particular type of data whose canonical form is smaller than its corresponding displayed image thereof when the displayed image has geometric properties that allow heuristics and rasterization for dynamic and accurate expansion using selected combined techniques. Texture mapping and mipmapping can be used to accurately reduce, expand and reorder layers in a viewable image expanded from a canonical expression of the VLSI layout.

Proceedings ArticleDOI
28 Oct 2002
TL;DR: An efficient VLSI architecture is proposed to provide a variety of hardware implementations for improving and possibly minimizing the critical path and memory requirements of lifting-based discrete wavelet transforms by flipping conventional lifting structures.
Abstract: Using the lifting scheme to construct VLSI architectures for discrete wavelet transforms outperforms using convolution in many aspects, such as computation complexity and boundary extension. Nevertheless, the critical path of the lifting scheme is potentially longer than that of convolution. Although pipelining can reduce the critical path, it will prolong the latency and require more registers for a 1D architecture as well as larger memory size for a 2D line-based architecture. In this paper, an efficient VLSI architecture is proposed to provide a variety of hardware implementations for improving and possibly minimizing the critical path and memory requirements of lifting-based discrete wavelet transforms by flipping conventional lifting structures. By case studies of a JPEG2000 defaulted filter and an integer filter, the efficiency of the proposed flipping structure is shown.

Proceedings ArticleDOI
29 Jan 2002
TL;DR: A real-time localization and tracking algorithm has been developed for detecting human hands in video images using a single-pixel-based classification, so that a continuous data stream can be processed.
Abstract: The advantage of parallel computing of artificial neural networks can be combined with the potentials of VLSI circuits in order to design a real time detection and tracking system applied to video images. Based on these facts, a real-time localization and tracking algorithm has been developed for detecting human hands in video images. Due to the real time aspect, a single-pixel-based classification is aspired, so that a continuous data stream can be processed. Consequently, no storage of full images or parts of them is necessary. The classification, whether a pixel belongs to a hand or to the background, is done by analyzing the RGB-values of a single pixel by means of an artificial neural network. To obtain the full processing speed of this neural network a hardware solution is realized in a Field Programmable Gate Array (FPGA).

Proceedings ArticleDOI
Pasquale Cocchini1
10 Nov 2002
TL;DR: This work proposes a new latency-aware technique for the performance-driven concurrent insertion of flip-flops and repeaters in VLSI circuits, and overwhelming evidence showing an exponential increase in the number of pipelined interconnects with process scaling is presented.
Abstract: For many years, CMOS process scaling has allowed a steady increase in the operating frequency and integration density of integrated circuits. Only recently, however, have we reached a point where it takes several clock cycles for global signals to traverse a complex digital system such as a modern microprocessor. Thus, interconnect latency must be taken into account in current and future design tools at the architectural as well as synthesis level. For this purpose, the author proposes a new latency-aware technique for the performance-driven concurrent insertion of flip-flops and repeaters in VLSI circuits. Overwhelming evidence showing an exponential increase in the number of pipelined interconnects with process scaling, for high-performance microprocessors as well as high-end ASICs, is also presented. This increase indicates a radical change in current design methodologies to cope with this new emerging problem.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: A VLSI architecture for interpolation that uses a transformation of the received word to reduce the number of iterations of the interpolation algorithm and how the memory requirements can be reduced and an important operation, the Hasse derivative, can be efficiently implemented in VLSi.
Abstract: The Koetter-Vardy algorithm is an algebraic soft-decision decoder for Reed-Solomon codes which is based on the Guruswami-Sudan list decoder. There are three main steps: 1) multiplicity calculation, 2) interpolation and 3) root finding. The Koetter-Vardy algorithm is challenging to implement due to the high cost of interpolation. We propose a VLSI architecture for interpolation that uses a transformation of the received word to reduce the number of iterations of the interpolation algorithm. We also show how the memory requirements can be reduced and an important operation, the Hasse derivative, can be efficiently implemented in VLSI.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: A 32-bit adder has been designed and simulated using HSPICE Level-49 parameters of a 0.6 /spl mu/m CMOS process and simulated measurements show that the worst-case delay is 1.56 ns, demonstrating 2.1 times speed improvement in comparison to a domino dynamic logic design implemented with the same technology.
Abstract: In this paper, a new logic-design style called Pseudo Dynamic Logic (SDL) is introduced. In this logic-design style, the internal nodes of the logic circuits are not precharged to high or low values, rather the initial charges on nodes are shared to yield an intermediate precharge value for faster evaluation. A 32-bit adder has been designed and simulated using HSPICE Level-49 parameters of a 0.6 /spl mu/m CMOS process. Simulated measurements on this adder show that the worst-case delay is 1.56 ns. This demonstrates 2.1 times speed improvement in comparison to a domino dynamic logic design implemented with the same technology.

Journal ArticleDOI
TL;DR: Simulations conducted on programmable logic show a sustained advantage over commercial chips for a representative set of applications, while prospective results on VLSI technology are also promising.
Abstract: Residue number system (RNS) is explored for implementation of fast digital signal processors with the design of an RNS-based SIMD RISC processor. Simulations conducted on programmable logic show a sustained advantage over commercial chips for a representative set of applications, while prospective results on VLSI technology are also promising.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: An effective systematic design method is proposed to construct several efficient VLSI architectures of 1-D and 2-D lifting-based discrete wavelet transform that are more efficient than previous arts in term of arithmetic units and memory storage.
Abstract: In this paper, an effective systematic design method is proposed to construct several efficient VLSI architectures of 1-D and 2-D lifting-based discrete wavelet transform. This design method first performs a specific lifting factorization for any finite discrete wavelet transform filter to obtain an optimal algorithm representation for hardware implementation. The optimized algorithm then turns into 1-D systolic architectures through dependence graph formation and systolic arrays mapping. Based on the 1-D architectures, a general 2-D discrete wavelet transform framework is used to construct the corresponding 2-D architectures. According to the comparison results, the constructed VLSI architectures are more efficient than previous arts in term of arithmetic units and memory storage.

Proceedings ArticleDOI
12 Aug 2002
TL;DR: In this article, two runtime mechanisms for reducing the leakage current of a CMOS circuit are described, in which the "sleep" signal is used to shift in a new set of external inputs and pre-selected internal signals into the circuit with the goal of setting the logic values of all of the internal signals so as to minimize the total leakage current in the circuit.
Abstract: . This paper describes two runtime mechanisms for reducing the leakage current of a CMOS circuit. In both cases, it is assumed that the system or environment produces a "sleep" signal that can be used to indicate that the circuit is in a standby mode. In the first method, the "sleep" signal is used to shift in a new set of external inputs and pre-selected internal signals into the circuit with the goal of setting the logic values of all of the internal signals so as to minimize the total leakage current in the circuit. This minimization is possible because the leakage current of a CMOS gate is a strong function of the input combination applied to its inputs. In the second method, NMOS and PMOS transistors are added to some of the gates in the circuit to increase the controllability of the internal signals of the circuit and decrease the leakage current of the gates using the "stack effect". This is, however, done carefully so that the minimum leakage is achieved subject to a delay constraint for all input-output paths in the circuit. In both cases, Boolean satisfiability is used to formulate the problems, which are subsequently solved by employing a highly efficient SAT solver. Experimental results on the circuits in the MCNC91 benchmark suite demonstrate that it is possible to reduce the leakage current by up to 70% in VLSI circuits at the expense of a very small overhead.

Patent
08 Apr 2002
TL;DR: In this article, a method and associated apparatus for the design and manufacture of VLSI circuits is described, which incorporates therein a method for routing connections between component tiles of the circuit being designed.
Abstract: Disclosed herein is a method and associated apparatus for the design and manufacture of VLSI circuit which incorporates therein a method for routing connections between component tiles of the VLSI circuit being designed.

Journal ArticleDOI
TL;DR: A new architecture for the decoder core with improved area and power dissipation properties is presented and partitioning techniques are proposed to reduce the power consumption of the decoding memories.
Abstract: The use of "turbo codes" has been proposed for several applications, including the development of wireless systems, where highly reliable transmission is required at very low signal-to-noise ratios (SNR). The problem of extracting the best coding gains from these kind of codes has been deeply investigated in the last years. Also the hardware implementation of turbo codes is a very challenging topic, mainly due to the iterative nature of the decoding process, which demands an operating frequency much higher than the data rate; in the case of wireless applications, the design constraints became even more strict due to the low-cost and low-power requirements. This paper first presents a new architecture for the decoder core with improved area and power dissipation properties; then partitioning techniques are proposed to reduce the power consumption of the decoder memories. It is proven that most of the power is dissipated by the large RAM units required by the decoder, so the described technique is very efficient: an average power saving of 70% with an area overhead of 23% has been obtained on a set of analyzed architectures.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: Performance comparison with traditional CMOS and various PTL design techniques is presented, with respect to the layout area, number of devices, delay and power dissipation, showing advantages and drawbacks of GDI as compared to other methods.
Abstract: GDI (Gate Diffusion Input) - a new technique of low power digital circuit design is described. This technique allows reducing power consumption, delay and area of digital circuits, while maintaining low complexity of logic design. Performance comparison with traditional CMOS and various PTL design techniques is presented, with respect to the layout area, number of devices, delay and power dissipation, showing advantages and drawbacks of GDI as compared to other methods. A variety of logic gates have been implemented in 0.35 /spl mu/m technology to compare the GDI technique with CMOS and PTL. A prototype test chip of 8-bit CLA adder has been fabricated, based on GDI and CMOS cell libraries, showing up to 45% reduction in power-delay product in GDI. Properties of implemented circuits are discussed, simulation results are reported and measurements of a test chip are presented.