Showing papers on "Very-large-scale integration published in 2002"

PDF

Open Access

Book•

Design of integrated circuits for optical communications

[...]

12 Sep 2002

TL;DR: This book systematically takes the reader from basic concepts to advanced topics, establishing both rigor and intuition in the design of high-speed integrated circuits for optical communication systems.

...read moreread less

Abstract: Design of Integrated Circuits for Optical Communications deals with the design of high-speed integrated circuits for optical communication systems. Written for both students and practicing engineers, the book systematically takes the reader from basic concepts to advanced topics, establishing both rigor and intuition. The text emphasizes analysis and design in modern VLSI technologies, particularly CMOS, and presents numerous broadband circuit techniques. Leading researcher Behzad Razavi is also the author of Design of Analog CMOS Integrated Circuits. Table of contents 1 Introduction to Optical Communications 2 Basic Concepts 3 Optical Devices 4 Transimpedance Amplifiers 5 Limiting Amplifiers and Output Buffers 6 Oscillator Fundamentals 7 LC Oscillators 8 Phase-Locked Loops 9 Clock and Data Recovery 10 Multiplexers and Laser Drivers

...read moreread less

693 citations

Journal Article•DOI•

Performance analysis of low-power 1-bit CMOS full adder cells

[...]

A.M. Shams¹, T.K. Darwish², Magdy Bayoumi²•Institutions (2)

Intel¹, University of Louisiana at Lafayette²

01 Feb 2002-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A performance analysis of 1-bit full-adder cell is presented, after the adder cell is anatomized into smaller modules, and several designs of each of them are developed, prototyped, simulated and analyzed.

...read moreread less

Abstract: A performance analysis of 1-bit full-adder cell is presented. The adder cell is anatomized into smaller modules. The modules are studied and evaluated extensively. Several designs of each of them are developed, prototyped, simulated and analyzed. Twenty different 1-bit full-adder cells are constructed (most of them are novel circuits) by connecting combinations of different designs of these modules. Each of these cells exhibits different power consumption, speed, area, and driving capability figures. Two realistic circuit structures that include adder cells are used for simulation. A library of full-adder cells is developed and presented to the circuit designers to pick the full-adder cell that satisfies their specific applications.

...read moreread less

454 citations

Journal Article•DOI•

A VLSI architecture for lifting-based forward and inverse wavelet transform

[...]

K. Andra¹, Chaitali Chakrabarti¹, T. Acharya²•Institutions (2)

Arizona State University¹, Intel²

01 Apr 2002-IEEE Transactions on Signal Processing

TL;DR: An architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000 using an architecture consisting of two row processors, two column processors, and two memory modules.

...read moreread less

Abstract: We propose an architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000. The architecture consists of two row processors, two column processors, and two memory modules. Each processor contains two adders, one multiplier, and one shifter. The precision of the multipliers and adders has been determined using extensive simulation. Each memory module consists of four banks in order to support the high computational bandwidth. The architecture has been designed to generate an output every cycle for the JPEG2000 default filters. The schedules have been generated by hand and the corresponding timings listed. Finally, the architecture has been implemented in behavioral VHDL. The estimated area of the proposed architecture in 0.18-/spl mu/ technology is 2.8 nun square, and the estimated frequency of operation is 200 MHz.

...read moreread less

350 citations

Book•

Analog VLSI: Circuits and Principles

[...]

Shih-Chii Liu, Tobias Delbrück, Jorgene Kramer, Giacomo Indiveri, Rodney J. Douglas - Show less +1 more

15 Nov 2002

TL;DR: This book presents the central concepts required for the creative and successful design of analog VLSI circuits and discusses device physics, linear and nonlinear circuit forms, translinear circuits, photodetectors, floating-gate devices, noise analysis, and process technology.

...read moreread less

Abstract: From the Publisher: Neuromorphic engineers work to improve the performance of artificial systems through the development of chips and systems that process information collectively using primarily analog circuits. This book presents the central concepts required for the creative and successful design of analog VLSI circuits. The discussion is weighted toward novel circuits that emulate natural signal processing. Unlike most circuits in commercial or industrial applications, these circuits operate mainly in the subthreshold or weak inversion region. Moreover, their functionality is not limited to linear operations, but also encompasses many interesting nonlinear operations similar to those occurring in natural systems. Topics include device physics, linear and nonlinear circuit forms, translinear circuits, photodetectors, floating-gate devices, noise analysis, and process technology.

...read moreread less

291 citations

Proceedings Article•DOI•

Impact of deep submicron technology on dependability of VLSI circuits

[...]

C. Constantinescu¹•Institutions (1)

Intel¹

23 Jun 2002

TL;DR: It is concluded that the semiconductor industry is approaching a new stage in the design and manufacturing of VLSI circuits, and Fault-tolerance features, specific to custom designed computers, have to be integrated into commercial-off-the-shelf (COTS) V LSI systems in the future, in order to preserve data integrity and limit the impact of transient and intermittent faults.

...read moreread less

Abstract: Advances in semiconductor technology have led to impressive performance gains of VLSI circuits, in general, and microprocessors, in particular. However, smaller transistor and interconnect dimensions, lower power voltages, and higher operating frequencies have contributed to increased rates of occurrence of transient and intermittent faults. We address the impact of deep submicron technology on permanent, transient and intermittent classes of faults, and discuss the main trends in circuit dependability. Two case studies exemplify this analysis. The first one deals with intermittent faults induced by manufacturing residuals. The second case study shows that transients generated by timing violations are capable of silently corrupting data. It is concluded that the semiconductor industry is approaching a new stage in the design and manufacturing of VLSI circuits. Fault-tolerance features, specific to custom designed computers, have to be integrated into commercial-off-the-shelf (COTS) VLSI systems in the future, in order to preserve data integrity and limit the impact of transient and intermittent faults.

...read moreread less

195 citations

Proceedings Article•DOI•

Congestion-Aware Logic Synthesis

[...]

Davide Pandini¹, Larry Pileggi², Andrzej J. Strojwas²•Institutions (2)

STMicroelectronics¹, Carnegie Mellon University²

04 Mar 2002

TL;DR: This paper proposes a novel methodology to incorporate congestion minimization within logic synthesis, and presents results for industrial circuits that validate the approach.

...read moreread less

Abstract: In this era of Deep Sub-Micron (DSM) technologies, the impact of interconnects is becoming increasingly important as it relates to integrated circuit (IC) functionality and performance. In the traditional top-down IC design flow, interconnect effects are first taken into account during logic synthesis by way of wireload models. However, for technologies of 0.25 /spl mu/m and below, the wiring capacitance dominates the gate capacitance and the delay estimation based on fanout and design legacy statistics can be highly inaccurate. In addition, logic block size is no longer dictated solely by total cell area, and is often limited by wiring area resources. For these reasons, wiring congestion is an extremely important design factor, and should be taken into consideration at the earliest possible stages of the design flow. In this paper we propose a novel methodology to incorporate congestion minimization within logic synthesis, and present results for industrial circuits that validate our approach.

...read moreread less

177 citations

Proceedings Article•DOI•

Dynamic and leakage power reduction in MTCMOS circuits using an automated efficient gate clustering technique

[...]

Mohab Anis¹, M. A. Mahmoud¹, Mohamed I. Elmasry¹, Shawki Areibi²•Institutions (2)

University of Waterloo¹, University of Guelph²

10 Jun 2002

TL;DR: Two techniques for efficient gate clustering in MTCMOS circuits by modeling the problem via Bin-Packing and Set-Partitioning techniques, which offer significant reduction in both dynamic and leakage power over previous techniques during the active and standby modes respectively are presented.

...read moreread less

Abstract: Reducing power dissipation is one of the most principle subjects in VLSI design today. Scaling causes subthreshold leakage currents to become a large component of total power dissipation. This paper presents two techniques for efficient gate clustering in MTCMOS circuits by modeling the problem via Bin-Packing (BP) and Set-Partitioning (SP) techniques. An automated solution is presented, and both techniques are applied to six benchmarks to verify functionality. Both methodologies offer significant reduction in both dynamic and leakage power over previous techniques during the active and standby modes respectively. Furthermore, the SP technique takes the circuit's routing complexity into consideration which is critical for Deep Sub-Micron (DSM) implementations. Sufficient performance is achieved, while significantly reducing the overall sleep transistors' area. Results obtained indicate that our proposed techniques can achieve on average 90% savings for leakage power and 15% savings for dynamic power.

...read moreread less

174 citations

Patent•

Gate array core cell for VLSI ASIC devices

[...]

Jai P. Bansal¹•Institutions (1)

BAE Systems¹

19 Dec 2002

TL;DR: In this article, a gate array core cell is proposed to reduce the overall wiring lengths, parasitic capacitance, and increase the circuit density and performance of gate array ASIC components, but with the advantage of reducing mask cost and processing time by about 50 percent.

...read moreread less

Abstract: A very efficient gate array core cell in which the base core cell consists of a group of 6 PMOS transistors and a group of 6 NMOS transistors. It also includes pre-wiring of 2 of the 6 PMOS transistors, with 2 of the 6 NMOS transistors at polysilicon level or at local interconnect level while leaving the remaining PMOS and NMOS transistors as individual transistors to be interconnected during the functional ASIC metallization process. The core cell also has 2 polysilicon or local interconnect wires embedded in it, which can be used to interconnect transistors for logic function implementation. The core cell defined in this invention is highly flexible and has been analyzed to interconnect all types of logic and memory functions needed for ASIC designs. The layout of the transistors, pre-wiring of the strategic transistors at polysilicon level or at local interconnect level, and embedded polysilicon or local interconnect wires reduce the core cell size significantly. This core cell design reduces the overall wiring lengths, parasitic capacitance, which in turn reduce delays, power dissipation and increase ASIC performance and circuit density. Gate array ASIC components designed using this core cell provide circuit density, performance and power dissipation characteristics comparable to the Standard Cell ASICs but with the advantage of reducing the mask cost and processing time by about 50 percent.

...read moreread less

171 citations

Book•

VLSI for wireless communication

[...]

Bosco Leung

01 Jan 2002

TL;DR: This presentation discusses communication concepts, receiver Architectures, and Frequency Synthesizer: Loop Filter and System Design, as well as the design of transmitter Architectures and Power Amplifier.

...read moreread less

Abstract: Communication Concepts: Circuit Designer Perspective.- Receiver Architectures.- Low Noise Amplifier.- Active Mixer.- Passive Mixer.- Analog-to-Digital Converters.- Frequency Synthesizer: Phase/Frequency Processing Components.- Frequency Synthesizer: Loop Filter and System Design.- Transmitter Architectures and Power Amplifier

...read moreread less

164 citations

Journal Article•DOI•

Energy-efficient noise-tolerant dynamic styles for scaled-down CMOS and MTCMOS technologies

[...]

Mohab Anis¹, M.W. Allam², Mohamed I. Elmasry¹•Institutions (2)

University of Waterloo¹, PMC-Sierra²

22 Apr 2002-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A new high-speed domino circuit, called HS-Domino, which resolves the tradeoff between performance and reliability in conventional CD-domino logic while dissipating low dynamic power with minimal area overhead and extends domino's operation in the deep submicron regime.

...read moreread less

Abstract: A new high-speed domino circuit, called HS-Domino has been developed. HS-Domino resolves the tradeoff between performance and reliability in conventional CD-domino logic while dissipating low dynamic power with minimal area overhead. HS-Domino, therefore, extends domino's operation in the deep submicron regime. A multithreshold implementation of HS-Domino is also devised to achieve substantially low leakage values during standby, while maintaining high performance and low power during the active mode. Furthermore, the generic multithreshold scheme is applied to differential cascode voltage switch (DDCVS) logic.

...read moreread less

136 citations

Proceedings Article•DOI•

An effective congestion driven placement framework

[...]

Ulrich Brenner¹, André Rohe¹•Institutions (1)

University of Bonn¹

07 Apr 2002

TL;DR: A fast but reliable way to detect routing criticalities in VLSI chips by using a congestion estimator for a dynamic avoidance of routability problems in one single run of the placement algorithm.

...read moreread less

Abstract: We present a fast but reliable way to detect routing criticalities in VLSI chips. In addition, we show how this congestion estimation can be incorporated into a partitioning based placement algorithm. Different to previous approaches, we do not rerun parts of the placement algorithm or apply a post-placement optimization, but we use our congestion estimator for a dynamic avoidance of routability problems in one single run of the placement algorithm. Computational experiments on chips with up to 1,300,000 cells are presented: The framework reduces the usage of the most critical routing edges by 9.0% on average, the running time increase for the placement is about 8.7%. However, due to the smaller congestion, the running time of routing tools can be decreased drastically, so the total time for placement and (global) routing is decreased by 47% on average.

...read moreread less

Proceedings Article•DOI•

Fabrication technologies for three-dimensional integrated circuits

[...]

Rafael Reif¹, A. Fan¹, Kuan-Neng Chen¹, Shamik Das¹•Institutions (1)

Massachusetts Institute of Technology¹

07 Aug 2002

TL;DR: The MIT approach to 3D VLSI integration is based on low-temperature Cu-Cu wafer bonding, where device wafers are bonded in a face-to-back manner, with short vertical vias and Cu- Cu pads as the inter-wafer throughway.

...read moreread less

Abstract: The MIT approach to 3D VLSI integration is based on low-temperature Cu-Cu wafer bonding. Device wafers are bonded in a face-to-back manner, with short vertical vias and Cu-Cu pads as the inter-wafer throughway. In our scheme, there are several reliability criteria, which include: (a) structural integrity of the Cu-Cu bond; (b) Cu-Cu contact electrical characteristics; and (c) process flow efficiency and repeatability. In addition, CAD tools are needed to aid in design and layout of 3DICs. This paper discusses recent results in all these areas.

...read moreread less

Journal Article•DOI•

Architectures and VLSI implementations of the AES-Proposal Rijndael

[...]

Nicolas Sklavos, Odysseas Koufopavlou

01 Dec 2002-IEEE Transactions on Computers

TL;DR: Two architectures and VLSI implementations of the AES Proposal, Rijndael, are presented and these alternative architectures are operated both for encryption and decryption process to reduce the required hardware resources and achieve high-speed performance.

...read moreread less

Abstract: Two architectures and VLSI implementations of the AES Proposal, Rijndael, are presented in this paper. These alternative architectures are operated both for encryption and decryption process. They reduce the required hardware resources and achieve high-speed performance. Their design philosophy is completely different. The first uses feedback logic and reaches a throughput value equal to 259 Mbit/sec. It performs efficiently in applications with low covered area resources. The second architecture is optimized for high-speed performance using pipelined technique. Its throughput can reach 3.65 Gbit/sec.

...read moreread less

Journal Article•DOI•

Simultaneous switching noise in on-chip CMOS power distribution networks

[...]

K.T. Tang¹, Eby G. Friedman²•Institutions (2)

Broadcom¹, University of Rochester²

01 Aug 2002-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An analytical expression characterizing the SSN voltage is presented here based on a lumped inductive-resistive-capacitive RLC model and the peak value of the SSn voltage is within 10% as compared to SPICE simulations.

...read moreread less

Abstract: Simultaneous switching noise (SSN) has become an important issue in the design of the internal on-chip power distribution networks in current very large scale integration/ultra large scale integration (VLSI/ULSI) circuits. An inductive model is used to characterize the power supply rails when a transient current is generated by simultaneously switching the on-chip registers and logic gates in a synchronous CMOS VLSI/ULSI circuit. An analytical expression characterizing the SSN voltage is presented here based on a lumped inductive-resistive-capacitive RLC model. The peak value of the SSN voltage based on this analytical expression is within 10% as compared to SPICE simulations. Design constraints at both the circuit and layout levels are also discussed based on minimizing the effects of the peak value of the SSN voltage.

...read moreread less

Proceedings Article•DOI•

Reconfiguration technique for reducing test time and test data volume in Illinois Scan Architecture based designs

[...]

A.R. Pandey¹, Janak H. Patel¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

28 Apr 2002

TL;DR: This paper proposes a technique based on the reconfiguration of scan chains to reduce test time and test data volume for Illinois Scan Architecture (ILS) based designs.

...read moreread less

Abstract: As the complexity of VLSI circuits is increasing due to the exponential rise in transistor count per chip, testing cost is becoming an important factor in the overall integrated circuit (IC) manufacturing cost. This paper addresses the issue of decreasing test cost by lowering the test data bits and the number of clock cycles required to test a chip. We propose a technique based on the reconfiguration of scan chains to reduce test time and test data volume for Illinois Scan Architecture (ILS) based designs. This technique is presented with details of hardware implementation as well as the test generation and test application procedures. The reduction in test time and test data volume achieved using this technique is quite significant in most circuits.

...read moreread less

Proceedings Article•DOI•

High-performance and low-power challenges for sub-70 nm microprocessor circuits

[...]

R.K. Krishnarnurthy¹, Atila Alvandpour¹, Vivek De¹, S. Borkar¹•Institutions (1)

Intel¹

07 Aug 2002

TL;DR: Circuit techniques to combat increasing switching and leakage power dissipation, poor leakage tolerance of large-signal cache arrays and register files, and worsening global on-chip interconnect scaling trend, are described.

...read moreread less

Abstract: CMOS technology scaling is becoming difficult beyond 70 nm node, raising new design challenges for high-performance and low-power microprocessors. This paper discusses some of the key paradigm shifts required. Circuit techniques to combat (i) increasing switching and leakage power dissipation, (ii) poor leakage tolerance of large-signal cache arrays and register files, and (iii) worsening global on-chip interconnect scaling trend, are described.

...read moreread less

Proceedings Article•DOI•

Floating-point bitwidth analysis via automatic differentiation

[...]

A.A. Gaffar¹, Oskar Mencer, Wayne Luk, Peter Y. K. Cheung, Nabeel Shirazi - Show less +1 more•Institutions (1)

Imperial College London¹

16 Dec 2002

TL;DR: This work presents a novel approach to bitwidth- or precision-analysis for floating-point designs, which involves analysing the dataflow graph representation of a design to see how sensitive the output of a node is to changes in the outputs of other nodes: higher sensitivity requires higher precision and hence more output bits.

...read moreread less

Abstract: Automatic bitwidth analysis is a key ingredient for highlevel programming of FPGAs and high-level synthesis of VLSI circuits. The objective is to find the minimal number of bits to represent a value in order to minimise the circuit area and to improve efficiency of the respective arithmetic operations, while satisfying user-defined numerical constraints. We present a novel approach to bitwidth- or precision-analysis for floating-point designs. The approach involves analysing the dataflow graph representation of a design to see how sensitive the output of a node is to changes in the outputs of other nodes: higher sensitivity requires higher precision and hence more output bits. We automate such sensitivity analysis by a mathematical method called automatic differentiation, which involves differentiating variables in a design with respect to other variables. We illustrate our approach by optimising the bitwidth for two examples, a discrete Fourier transform (DFT) implementation and a Finite Impulse Response (FIR) filter implementation.

...read moreread less

Journal Article•DOI•

A practical implementation of parallel dynamic load balancing for adaptive computing in VLSI device simulation

[...]

Yiming Li, Simon M. Sze¹, Tien-Sheng Chao•Institutions (1)

National Chiao Tung University¹

14 Aug 2002-Engineering With Computers

TL;DR: A new parallel semiconductor device simulation using the dynamic load balancing approach based on the adaptive finite volume method with a posteriori error estimation has been developed and successfully implemented on a 16-PC Linux cluster with a message passing interface library.

...read moreread less

Abstract: We present a new parallel semiconductor device simulation using the dynamic load balancing approach. This semiconductor device simulation based on the adaptive finite volume method with a posteriori error estimation has been developed and successfully implemented on a 16-PC Linux cluster with a message passing interface library. A constructive monotone iterative technique is also applied for solution of the system of nonlinear algebraic equations. Two different parallel versions of the algorithm to perform a complete device simulation are proposed. The first is a dynamic parallel domain decomposition approach, and the second is a parallel current-voltage characteristic points simulation. This implementation shows that a well-designed load balancing simulation can significantly reduce the execution time up to an order of magnitude. Compared with the measured data, numerical results on various submicron VLSI devices are presented, to show the accuracy and efficiency of the method.

...read moreread less

Patent•

Method and system for displaying VLSI layout data

[...]

Jeff Solomon¹•Institutions (1)

Stanford University¹

10 Dec 2002

TL;DR: In this article, a VLSI layout editor and a method for using same that increases display and re-display speed and accuracy uses properties inherent to VLS I layouts that allow them to be displayed efficiently and accurately independent of the canonical expression of the design.

...read moreread less

Abstract: A VLSI layout editor and method for using same that increases display and re-display speed and accuracy uses properties inherent to VLSI layouts that allows them to be displayed efficiently and accurately independent of the canonical expression of the VLSI design. The VLSI layout editor and methods for using same use precomputed images that each represent a portion of the VLSI layout, a hierarchy cache that includes multiple LOD versions of selected sub-designs in the pre-computed images, and selected direct determination of the viewable representation from the canonical expression for at least one LOD. Apparatus and methods according to the present invention can render a particular type of data whose canonical form is smaller than its corresponding displayed image thereof when the displayed image has geometric properties that allow heuristics and rasterization for dynamic and accurate expansion using selected combined techniques. Texture mapping and mipmapping can be used to accurately reduce, expand and reorder layers in a viewable image expanded from a canonical expression of the VLSI layout.

...read moreread less

Proceedings Article•DOI•

Flipping structure: an efficient VLSI architecture for lifting-based discrete wavelet transform

[...]

Chao-Tsung Huang¹, Po-Chih Tseng¹, Liang-Gee Chen¹•Institutions (1)

National Taiwan University¹

28 Oct 2002

TL;DR: An efficient VLSI architecture is proposed to provide a variety of hardware implementations for improving and possibly minimizing the critical path and memory requirements of lifting-based discrete wavelet transforms by flipping conventional lifting structures.

...read moreread less

Abstract: Using the lifting scheme to construct VLSI architectures for discrete wavelet transforms outperforms using convolution in many aspects, such as computation complexity and boundary extension. Nevertheless, the critical path of the lifting scheme is potentially longer than that of convolution. Although pipelining can reduce the critical path, it will prolong the latency and require more registers for a 1D architecture as well as larger memory size for a 2D line-based architecture. In this paper, an efficient VLSI architecture is proposed to provide a variety of hardware implementations for improving and possibly minimizing the critical path and memory requirements of lifting-based discrete wavelet transforms by flipping conventional lifting structures. By case studies of a JPEG2000 defaulted filter and an integer filter, the efficiency of the proposed flipping structure is shown.

...read moreread less

Proceedings Article•DOI•

FPGA implementation of a neural network for a real-time hand tracking system

[...]

M. Krips, T. Lammert¹, Anton Kummert¹•Institutions (1)

University of Wuppertal¹

29 Jan 2002

TL;DR: A real-time localization and tracking algorithm has been developed for detecting human hands in video images using a single-pixel-based classification, so that a continuous data stream can be processed.

...read moreread less

Abstract: The advantage of parallel computing of artificial neural networks can be combined with the potentials of VLSI circuits in order to design a real time detection and tracking system applied to video images. Based on these facts, a real-time localization and tracking algorithm has been developed for detecting human hands in video images. Due to the real time aspect, a single-pixel-based classification is aspired, so that a continuous data stream can be processed. Consequently, no storage of full images or parts of them is necessary. The classification, whether a pixel belongs to a hand or to the background, is done by analyzing the RGB-values of a single pixel by means of an artificial neural network. To obtain the full processing speed of this neural network a hardware solution is realized in a Field Programmable Gate Array (FPGA).

...read moreread less

Proceedings Article•DOI•

Concurrent flip-flop and repeater insertion for high performance integrated circuits

[...]

Pasquale Cocchini¹•Institutions (1)

Intel¹

10 Nov 2002

TL;DR: This work proposes a new latency-aware technique for the performance-driven concurrent insertion of flip-flops and repeaters in VLSI circuits, and overwhelming evidence showing an exponential increase in the number of pipelined interconnects with process scaling is presented.

...read moreread less

Abstract: For many years, CMOS process scaling has allowed a steady increase in the operating frequency and integration density of integrated circuits. Only recently, however, have we reached a point where it takes several clock cycles for global signals to traverse a complex digital system such as a modern microprocessor. Thus, interconnect latency must be taken into account in current and future design tools at the architectural as well as synthesis level. For this purpose, the author proposes a new latency-aware technique for the performance-driven concurrent insertion of flip-flops and repeaters in VLSI circuits. Overwhelming evidence showing an exponential increase in the number of pipelined interconnects with process scaling, for high-performance microprocessors as well as high-end ASICs, is also presented. This increase indicates a radical change in current design methodologies to cope with this new emerging problem.

...read moreread less

Proceedings Article•DOI•

A VLSI architecture for interpolation in soft-decision list decoding of Reed-Solomon codes

[...]

Warren J. Gross¹, Frank R. Kschischang¹, Ralf Koetter, R.G. Gulak•Institutions (1)

University of Toronto¹

10 Dec 2002

TL;DR: A VLSI architecture for interpolation that uses a transformation of the received word to reduce the number of iterations of the interpolation algorithm and how the memory requirements can be reduced and an important operation, the Hasse derivative, can be efficiently implemented in VLSi.

...read moreread less

Abstract: The Koetter-Vardy algorithm is an algebraic soft-decision decoder for Reed-Solomon codes which is based on the Guruswami-Sudan list decoder. There are three main steps: 1) multiplicity calculation, 2) interpolation and 3) root finding. The Koetter-Vardy algorithm is challenging to implement due to the high cost of interpolation. We propose a VLSI architecture for interpolation that uses a transformation of the received word to reduce the number of iterations of the interpolation algorithm. We also show how the memory requirements can be reduced and an important operation, the Hasse derivative, can be efficiently implemented in VLSI.

...read moreread less

Proceedings Article•DOI•

Pseudo dynamic logic (SDL): a high-speed and low-power dynamic logic family

[...]

G.R. Chaji¹, Sied Mehdi Fakhraie¹, Kenneth C. Smith•Institutions (1)

University of Tehran¹

07 Aug 2002

TL;DR: A 32-bit adder has been designed and simulated using HSPICE Level-49 parameters of a 0.6 /spl mu/m CMOS process and simulated measurements show that the worst-case delay is 1.56 ns, demonstrating 2.1 times speed improvement in comparison to a domino dynamic logic design implemented with the same technology.

...read moreread less

Abstract: In this paper, a new logic-design style called Pseudo Dynamic Logic (SDL) is introduced. In this logic-design style, the internal nodes of the logic circuits are not precharged to high or low values, rather the initial charges on nodes are shared to yield an intermediate precharge value for faster evaluation. A 32-bit adder has been designed and simulated using HSPICE Level-49 parameters of a 0.6 /spl mu/m CMOS process. Simulated measurements on this adder show that the worst-case delay is 1.56 ns. This demonstrates 2.1 times speed improvement in comparison to a domino dynamic logic design implemented with the same technology.

...read moreread less

Journal Article•DOI•

RNS-enabled digital signal processor design

[...]

Javier Ramírez, Antonio García¹, Sergio Lopez-Buedo², Antonio Lloris¹•Institutions (2)

University of Granada¹, Autonomous University of Madrid²

14 Mar 2002-Electronics Letters

TL;DR: Simulations conducted on programmable logic show a sustained advantage over commercial chips for a representative set of applications, while prospective results on VLSI technology are also promising.

...read moreread less

Abstract: Residue number system (RNS) is explored for implementation of fast digital signal processors with the design of an RNS-based SIMD RISC processor. Simulations conducted on programmable logic show a sustained advantage over commercial chips for a representative set of applications, while prospective results on VLSI technology are also promising.

...read moreread less

Proceedings Article•DOI•

Efficient VLSI architectures of lifting-based discrete wavelet transform by systematic design method

[...]

Chao-Tsung Huang¹, Po-Chih Tseng¹, Liang-Gee Chen¹•Institutions (1)

National Taiwan University¹

07 Aug 2002

TL;DR: An effective systematic design method is proposed to construct several efficient VLSI architectures of 1-D and 2-D lifting-based discrete wavelet transform that are more efficient than previous arts in term of arithmetic units and memory storage.

...read moreread less

Abstract: In this paper, an effective systematic design method is proposed to construct several efficient VLSI architectures of 1-D and 2-D lifting-based discrete wavelet transform. This design method first performs a specific lifting factorization for any finite discrete wavelet transform filter to obtain an optimal algorithm representation for hardware implementation. The optimized algorithm then turns into 1-D systolic architectures through dependence graph formation and systolic arrays mapping. Based on the 1-D architectures, a general 2-D discrete wavelet transform framework is used to construct the corresponding 2-D architectures. According to the comparison results, the constructed VLSI architectures are more efficient than previous arts in term of arithmetic units and memory storage.

...read moreread less

Proceedings Article•DOI•

Runtime mechanisms for leakage current reduction in CMOS VLSI circuits1,2

[...]

Afshin Abdollahi¹, Massoud Pedram¹, Farzan Fallah²•Institutions (2)

University of Southern California¹, Fujitsu²

12 Aug 2002

TL;DR: In this article, two runtime mechanisms for reducing the leakage current of a CMOS circuit are described, in which the "sleep" signal is used to shift in a new set of external inputs and pre-selected internal signals into the circuit with the goal of setting the logic values of all of the internal signals so as to minimize the total leakage current in the circuit.

...read moreread less

Abstract: . This paper describes two runtime mechanisms for reducing the leakage current of a CMOS circuit. In both cases, it is assumed that the system or environment produces a "sleep" signal that can be used to indicate that the circuit is in a standby mode. In the first method, the "sleep" signal is used to shift in a new set of external inputs and pre-selected internal signals into the circuit with the goal of setting the logic values of all of the internal signals so as to minimize the total leakage current in the circuit. This minimization is possible because the leakage current of a CMOS gate is a strong function of the input combination applied to its inputs. In the second method, NMOS and PMOS transistors are added to some of the gates in the circuit to increase the controllability of the internal signals of the circuit and decrease the leakage current of the gates using the "stack effect". This is, however, done carefully so that the minimum leakage is achieved subject to a delay constraint for all input-output paths in the circuit. In both cases, Boolean satisfiability is used to formulate the problems, which are subsequently solved by employing a highly efficient SAT solver. Experimental results on the circuits in the MCNC91 benchmark suite demonstrate that it is possible to reduce the leakage current by up to 70% in VLSI circuits at the expense of a very small overhead.

...read moreread less

Patent•

Detailed method for routing connections using tile expansion techniques and associated methods for designing and manufacturing VLSI circuits

[...]

Zhaoyun Xing¹•Institutions (1)

Advanced Micro Devices¹

08 Apr 2002

TL;DR: In this article, a method and associated apparatus for the design and manufacture of VLSI circuits is described, which incorporates therein a method for routing connections between component tiles of the circuit being designed.

...read moreread less

Abstract: Disclosed herein is a method and associated apparatus for the design and manufacture of VLSI circuit which incorporates therein a method for routing connections between component tiles of the VLSI circuit being designed.

...read moreread less

Journal Article•DOI•

Architectural strategies for low-power VLSI turbo decoders

[...]

Guido Masera¹, M. Mazza¹, Gianluca Piccinini¹, F. Viglione¹, Maurizio Zamboni¹ - Show less +1 more•Institutions (1)

Polytechnic University of Turin¹

01 Jun 2002-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A new architecture for the decoder core with improved area and power dissipation properties is presented and partitioning techniques are proposed to reduce the power consumption of the decoding memories.

...read moreread less

Abstract: The use of "turbo codes" has been proposed for several applications, including the development of wireless systems, where highly reliable transmission is required at very low signal-to-noise ratios (SNR). The problem of extracting the best coding gains from these kind of codes has been deeply investigated in the last years. Also the hardware implementation of turbo codes is a very challenging topic, mainly due to the iterative nature of the decoding process, which demands an operating frequency much higher than the data rate; in the case of wireless applications, the design constraints became even more strict due to the low-cost and low-power requirements. This paper first presents a new architecture for the decoder core with improved area and power dissipation properties; then partitioning techniques are proposed to reduce the power consumption of the decoder memories. It is proven that most of the power is dissipated by the large RAM units required by the decoder, so the described technique is very efficient: an average power saving of 70% with an area overhead of 23% has been obtained on a set of analyzed architectures.

...read moreread less

Proceedings Article•DOI•

Gate-diffusion input (GDI) - a technique for low power design of digital circuits: analysis and characterization

[...]

Arkadiy Morgenshtein¹, Alexander Fish², Israel A. Wagner³•Institutions (3)

Technion – Israel Institute of Technology¹, Ben-Gurion University of the Negev², IBM³

07 Aug 2002

TL;DR: Performance comparison with traditional CMOS and various PTL design techniques is presented, with respect to the layout area, number of devices, delay and power dissipation, showing advantages and drawbacks of GDI as compared to other methods.

...read moreread less

Abstract: GDI (Gate Diffusion Input) - a new technique of low power digital circuit design is described. This technique allows reducing power consumption, delay and area of digital circuits, while maintaining low complexity of logic design. Performance comparison with traditional CMOS and various PTL design techniques is presented, with respect to the layout area, number of devices, delay and power dissipation, showing advantages and drawbacks of GDI as compared to other methods. A variety of logic gates have been implemented in 0.35 /spl mu/m technology to compare the GDI technique with CMOS and PTL. A prototype test chip of 8-bit CLA adder has been fabricated, based on GDI and CMOS cell libraries, showing up to 45% reduction in power-delay product in GDI. Properties of implemented circuits are discussed, simulation results are reported and measurements of a test chip are presented.

...read moreread less

Collapse