scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 2003"


Book ChapterDOI
TL;DR: There is a growing interest in Networks on Chips (NoC) that is related to the evolution of integrated circuit technology and to the growing requirements in performance and portability of electronic systems.
Abstract: We are witnessing a growing interest in Networks on Chips (NoC) that is related to the evolution of integrated circuit technology and to the growing requirements in performance and portability of electronic systems. Current integrated circuits contain several processing cores, and even relatively simple systems, such as cellular telephones, behave as multiprocessors. Moreover, many electronic systems consist of heterogeneous components and they require efficient on-chip communication. In the last few years, multiprocessing platforms have been developed to address high performance computation, such as image rendering. Examples are Sony’s emotion engine [OKA] and IBM’s cell chip [PHAM] where on-chip communication efficiency is key to the overall system performance.

641 citations


Journal ArticleDOI
TL;DR: The main trends and challenges in circuit reliability are discussed, and evolving techniques for dealing with them are explained.
Abstract: Deep-submicron technology is having a significant impact on permanent, intermittent, and transient classes of faults. This article discusses the main trends and challenges in circuit reliability, and explains evolving techniques for dealing with them.

622 citations


Journal ArticleDOI
TL;DR: In this paper, the authors sketch a basic architecture for nanoscale electronics based on carbon nanotubes, silicon nanowires, and nano-scale FETs, which can provide universal logic functionality with all logic and signal restoration operating at the nan-scale.
Abstract: Advances in our basic scientific understanding at the molecular and atomic level place us on the verge of engineering designer structures with key features at the single nanometer scale. This offers us the opportunity to design computing systems at what may be the ultimate limits on device size. At this scale, we are faced with new challenges and a new cost structure which motivates different computing architectures than we found efficient and appropriate in conventional very large scale integration (VLSI). We sketch a basic architecture for nanoscale electronics based on carbon nanotubes, silicon nanowires, and nano-scale FETs. This architecture can provide universal logic functionality with all logic and signal restoration operating at the nanoscale. The key properties of this architecture are its minimalism, defect tolerance, and compatibility with emerging bottom-up nanoscale fabrication techniques. The architecture further supports micro-to-nanoscale interfacing for communication with conventional integrated circuits and bootstrap loading.

472 citations


Journal ArticleDOI
TL;DR: This paper is an in-depth review on silicon implementations of threshold logic gates that covers several decades and detail numerous very-large-scale integration (VLSI) implementations including capacitive, conductance/current, and pseudo-nMOS and output-wired-inverters, as well as many differential solutions.
Abstract: This paper is an in-depth review on silicon implementations of threshold logic gates that covers several decades. In this paper, we will mention early MOS threshold logic solutions and detail numerous very-large-scale integration (VLSI) implementations including capacitive (switched capacitor and floating gate with their variations), conductance/current (pseudo-nMOS and output-wired-inverters, including a plethora of solutions evolved from them), as well as many differential solutions. At the end, we will briefly mention other implementations, e.g., based on negative resistance devices and on single electron technologies.

240 citations


Journal ArticleDOI
TL;DR: This contribution describes the design and performance testing of an Advanced Encryption Standard (AES) compliant encryption chip that delivers 2.29 GB/s of encryption throughput at 56 mW of power consumption in a 0.18-/spl mu/m CMOS standard cell technology.
Abstract: This contribution describes the design and performance testing of an Advanced Encryption Standard (AES) compliant encryption chip that delivers 2.29 GB/s of encryption throughput at 56 mW of power consumption in a 0.18-/spl mu/m CMOS standard cell technology. This integrated circuit implements the Rijndael encryption algorithm, at any combination of block lengths (128, 192, or 25 bits) and key lengths (128, 192, or 256 bits). We present the chip architecture and discuss the design optimizations. We also present measurement results that were obtained from a set of 14 test samples of this chip.

224 citations


Proceedings ArticleDOI
25 May 2003
TL;DR: A switch-based network-centric architecture to interconnect IP blocks is proposed with a butterfly fat tree architecture as an overall interconnect template and wormhole routing is adopted to reduce overall latency and hardware overhead.
Abstract: System on Chip (SoC) design in the forthcoming billion transistor era will involve the integration of numerous heterogeneous semiconductor intellectual property (IP) blocks Some of the main problems in the ultra deep sub micron technologies characterized by gate lengths in the range of 50-100 nm arise from non-scalable global wire delays, failure to achieve global synchronization, errors due to signal integrity issues, and difficulties associated with non-scalable bus-based functional interconnect These problems are addressed in this paper by introducing a new design methodology A switch-based network-centric architecture to interconnect IP blocks is proposed We introduce a butterfly fat tree architecture as an overall interconnect template In this new interconnect architecture, switches are used to transfer data between IP blocks To reduce overall latency and hardware overhead, wormhole routing is adopted The proposed switch architecture supports this routing method Initial implementation of the switch reveals that the total switch area is expected to amount to less than 2% of a large SoC

223 citations


Journal ArticleDOI
TL;DR: Three redundancy analysis algorithms which can be implemented on-chip based on the local-bitmap idea are presented: the local repair-most approach is efficient for a general spare architecture, and the local optimization approach has the best repair rate.
Abstract: With the advance of VLSI technology, the capacity and density of memories is rapidly growing. The yield improvement and testing issues have become the most critical challenges for memory manufacturing. Conventionally, redundancies are applied so that the faulty cells can be repairable. Redundancy analysis using external memory testers is becoming inefficient as the chip density continues to grow, especially for the system chip with large embedded memories. This paper presents three redundancy analysis algorithms which can be implemented on-chip. Among them, two are based on the local-bitmap idea: the local repair-most approach is efficient for a general spare architecture, and the local optimization approach has the best repair rate. The essential spare pivoting technique is proposed to reduce the control complexity. Furthermore, a simulator has been developed for evaluating the repair efficiency of different algorithms. It is also used for determining certain important parameters in redundancy design. The redundancy analysis circuit can easily be integrated with the built-in self-test circuit.

189 citations


Journal ArticleDOI
TL;DR: Two unique algorithms are developed and implemented with low-power and fast circuits that reduce the maximum percent errors that result from binary-to-binary logarithm conversion to 0.9299 percent, 0.4314 percent, and 0.1538 percent.
Abstract: We present a unique 32-bit binary-to-binary logarithm converter including its CMOS VLSI implementation. The converter is implemented using combinational logic only and it calculates a logarithm approximation in a single clock cycle. Unlike other complex logarithm correcting algorithms, three unique algorithms are developed and implemented with low-power and fast circuits that reduce the maximum percent errors that result from binary-to-binary logarithm conversion to 0.9299 percent, 0.4314 percent, and 0.1538 percent. Fast 4, 16, and 32-bit leading-one detector circuits are designed to obtain the leading-one position of an input binary word. A 32-word/spl times/5-bit MOS ROM is used to provide 5-bit integers based on the corresponding leading-one position. Both converter area and speed have been considered in the design approach, resulting in the use of a very efficient 32-bit logarithmic shifter in the 32-bit logarithmic converter. The converter is implemented using 0.6/spl mu/m CMOS technology, and it requires 1,600/spl lambda//spl times/2,800/spl lambda/ of chip area. Simulations of the CMOS design for the 32-bit logarithmic converter, operating at V/sub DD/ equal to 5 volts, run at 55 MHz, and the converter consumes 20 milliwatts.

165 citations


Journal ArticleDOI
01 Feb 2003
TL;DR: It is found that in FPGAs with more than 20 K four-input look-up tables, the reduction in channel width, interconnect delay and power dissipation can be over 50% by 3-D implementation.
Abstract: In this paper, analytical models for predicting interconnect requirements in field-programmable gate arrays (FPGAs) are presented, and opportunities for three-dimensional (3-D) implementation of FPGAs are examined. The analytical models for two-dimensional FPGAs are calibrated by routing and placement experiments with benchmark circuits and extended to 3-D FPGAs. Based on system-level modeling, we find that in FPGAs with more than 20 K four-input look-up tables, the reduction in channel width, interconnect delay and power dissipation can be over 50% by 3-D implementation.

157 citations


Proceedings ArticleDOI
09 Feb 2003
TL;DR: The techniques to be presented range over the system, software, circuit, and device levels including interconnect and I/O issues and the novel trend is to examine cooperative approaches between levels such as software-circuit cooperation and circuit-technology cooperation.
Abstract: In the coming ubiquitous-IT society, low-power design is one of the key features at which the VLSI designer should aim. Otherwise, power increase will remain as one of the main obstacles to Moore's law growth. Unless VLSI power is lowered by orders of magnitude, we cannot enjoy the progress that scaling offers. This talk will cover what we now have, and what we should provide in our low-power armory to allow us to cope with ever-increasing leakage loss, as well as dynamic power. The techniques to be presented range over the system, software, circuit, and device levels including interconnect and I/O issues. The novel trend is to examine cooperative approaches between levels such as software-circuit cooperation and circuit-technology cooperation. The biggest challenge that System-on-Chip designers must resolve in the future is the fact that transistors for digital and memory circuits will be more and more leaky as technology generations advance. Approaches to solving this serious problem will be described. Beyond the quest for low-power solutions lies a promising world of ubiquitous VLSI devices and products ranging from "wireless sensors and tags for everything" to "everything-you-must-do mobile terminals".

118 citations


Journal ArticleDOI
TL;DR: A novel VLSI architecture for a fully parallel precomputation-based content-addressable memory (PB-CAM) with low-power, low-cost, and low-voltage features and adopts the static pseudo-nMOS circuit design to improve system performance.
Abstract: This paper presents a novel VLSI architecture for a fully parallel precomputation-based content-addressable memory (PB-CAM) with low-power, low-cost, and low-voltage features. This design is based on a precomputation approach that saves not only power consumption of the CAM system, but also reduces transistor count and operating voltage of the CAM cell. In addition, the proposed PB-CAM word structure adopts the static pseudo-nMOS circuit design to improve system performance. The whole design was fabricated with the TSMC 0.35-/spl mu/m single-poly quadruple-metal CMOS process. With a 128 words by 30 bits CAM size, the measurement results indicate that the proposed circuit works up to 100 MHz with power consumption of 33 mW at 3.3-V supply voltage and works up to 30 MHz under 1.5-V supply voltage.

Journal ArticleDOI
TL;DR: It is shown that the critical path of the algorithm can be reduced if the add-MAX* operation is reordered into an offset-add-compare-select operation by adjusting the location of registers.
Abstract: This paper presents several techniques for the very large-scale integration (VLSI) implementation of the maximum a posteriori (MAP) algorithm. In general, knowledge about the implementation of the Viterbi (1967) algorithm can be applied to the MAP algorithm. Bounds are derived for the dynamic range of the state metrics which enable the designer to optimize the word length. The computational kernel of the algorithm is the add-MAX* operation, which is the add-compare-select operation of the Viterbi algorithm with an added offset. We show that the critical path of the algorithm can be reduced if the add-MAX* operation is reordered into an offset-add-compare-select operation by adjusting the location of registers. A general scheduling for the MAP algorithm is presented which gives the tradeoffs between computational complexity, latency, and memory size. Some of these architectures eliminate the need for RAM blocks with unusual form factors or can replace the RAM with registers. These architectures are suited to VLSI implementation of turbo decoders.

Book ChapterDOI
01 Jan 2003
TL;DR: In this article, the results of a comprehensive investigation into the characteristics and optimization of inductors fabricated with the top-level metal of a submicron silicon VLSI process are presented.
Abstract: The results of a comprehensive investigation into the characteristics and optimization of Inductors fabricated with the top-level metal of a submicron silicon VLSI process are presented. A computer program which extncts a physics-based model of microstrip components that is suitable for circuit (SPICE) simulation has been used to evaluate the effect of variations in melallization, layout geometry, and substrate parameters upon monolithic inductor performance. Three-dimensional (3-D) numerical simulations and experimental measurements of inductors were also used to benchmark the model aecuncy. It is shown in this work that low inductor Q is primarily due to the restrictions imposed by the thin interconnect metallization available in most very large scale integration (VLSI) technolocies, and that computer optimization of the inductor layout can be used to achieve a 50% improvement in component Q-factor over unoptimized designs.

Journal ArticleDOI
TL;DR: Presents a new all-MOS circuit technique for very-low-voltage proportional-to-absolute temperature (PTAT) references and reports good agreement between analytical, simulated, and experimental data, exhibiting PSRR(DC)+>60 dB.
Abstract: Presents a new all-MOS circuit technique for very-low-voltage proportional-to-absolute temperature (PTAT) references. Optimization of supply scaling below the sum of threshold voltages is based on log companding and implemented by operating the MOSFET in weak inversion. The key design equations for current (/spl mu/A) and voltage (sub-100 mV) references and their standard deviations (around 5%) are derived by analytical analysis. Two sub-1-V sub-5-/spl mu/W integrated PTAT references are presented and exhaustively tested for 1.2- and 0.35-/spl mu/m very large scale integration technologies. Both designs report good agreement between analytical, simulated, and experimental data, exhibiting PSRR(DC)+>60 dB. Hence, the resulting PTAT circuits are suitable for very-low-voltage system-on-a-chip applications in digital CMOS technologies.

Proceedings ArticleDOI
03 Mar 2003
TL;DR: The operation state machine (OSM) computation model is proposed to serve as the foundation of a retargetable modeling framework capable of accurately capturing complex processor behaviors and generating efficient simulators.
Abstract: Given the growth in application-specific processors, there is a strong need for a retargetable modeling framework that is capable of accurately capturing complex processor behaviors and generating efficient simulators. We propose the operation state machine (OSM) computation model to serve as the foundation of such a modeling framework. The OSM model separates the processor into two interacting layers: the operation layer where operation semantics and timing are modeled, and the hardware layer where disciplined hardware units interact. This declarative model allows for direct synthesis of micro-architecture simulators as it encapsulates precise concurrency semantics of microprocessors. We illustrate the practical benefits of this model through two case studies - the StrongARM core and the PowerPC-750 superscalar processor. The experimental results demonstrate that the OSM model has excellent modeling productivity and model efficiency. Additional applications of this modeling framework include derivation of information required by compilers and formal analysis for processor validation.

Book
01 Jan 2003
TL;DR: This book is the first book that covers all asopects of power-constrained test solutions and is a reflection of authors own research and also survey of the major contributions in this domain.
Abstract: This book is the first book that covers all asopects of power-constrained test solutions. It is a reflection of authors own research and also survey of the major contributions in this domain.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed sequence-of-linear-program method is orders of magnitude faster than the best-known method based on conjugate gradients with constantly better solution qualities.
Abstract: This paper presents a new method of sizing the widths of the power and ground routes in integrated circuits so that the chip area required by the routes is minimized subject to electromigration and IR voltage drop constraints. The basic idea is to transform the underlying constrained nonlinear programming problem into a sequence of linear programs. Theoretically, we show (that the sequence of linear programs always converges to the optimum solution of the relaxed convex optimization problem. Experimental results demonstrate that the proposed sequence-of-linear-program method Is orders of magnitude faster than the best-known method based on conjugate gradients with constantly better solution qualities.

Book
15 Dec 2003
TL;DR: The second edition of Digital Integrated Circuits: Analysis and Design focuses on timeless principles with a modern interdisciplinary view that will serve integrated circuits engineers from all disciplines for years to come as mentioned in this paper.
Abstract: Exponential improvement in functionality and performance of digital integrated circuits has revolutionized the way we live and work. The continued scaling down of MOS transistors has broadened the scope of use for circuit technology to the point that texts on the topic are generally lacking after a few years. The second edition of Digital Integrated Circuits: Analysis and Design focuses on timeless principles with a modern interdisciplinary view that will serve integrated circuits engineers from all disciplines for years to come. Providing a revised instructional reference for engineers involved with Very Large Scale Integrated Circuit design and fabrication, this book delves into the dramatic advances in the field, including new applications and changes in the physics of operation made possible by relentless miniaturization. This book was conceived in the versatile spirit of the field to bridge a void that had existed between books on transistor electronics and those covering VLSI design and fabrication as a separate topic. Like the first edition, this volume is a crucial link for integrated circuit engineers and those studying the field, supplying the cross-disciplinary connections they require for guidance in more advanced work. For pedagogical reasons, the author uses SPICE level 1 computer simulation models but introduces BSIM models that are indispensable for VLSI design. This enables users to develop a strong and intuitive sense of device and circuit design by drawing direct connections between the hand analysis and the SPICE models. With four new chapters, more than 200 new illustrations, numerous worked examples, case studies, and support provided on a dynamic website, this text significantly expands concepts presented in the first edition.

Journal ArticleDOI
TL;DR: A VLSI implementation of a unique 32-bit antilogarithmic converter, which generates data for some digital-signal-processing (DSP) applications, and its combinational logic implementation requires 1500/spllambda//spl times/2800/spl lambda/ of chip area.
Abstract: This paper presents a VLSI implementation of a unique 32-bit antilogarithmic converter, which generates data for some digital-signal-processing (DSP) applications. Novel antilogarithm correcting algorithms are developed and implemented with low-power and hardware-efficient correcting circuits. The VLSI implementations of these algorithms are much smaller than other hardware intensive algorithms found in the literature. The converter is implemented using 0.6 /spl mu/m CMOS technology, and its combinational logic implementation requires 1500/spl lambda//spl times/2800/spl lambda/ of chip area. The 32-bit antilogarithmic converter computes the antilogarithm in a single clock cycle and runs at 100 MHz and consumes 81 mW.

Journal ArticleDOI
TL;DR: This paper presents an efficient VLSI architecture of the pipeline fast Fourier transform (FFT) processor based on radix-4 decimation-in-time algorithm with the use of digit-serial arithmetic units by combining both the feedforward and feedback commutator schemes.
Abstract: This paper presents an efficient VLSI architecture of the pipeline fast Fourier transform (FFT) processor based on radix-4 decimation-in-time algorithm with the use of digit-serial arithmetic units. By combining both the feedforward and feedback commutator schemes, the proposed architecture can not only achieve nearly 100% hardware utilization, but also require much less memory compared with the previous digit-serial FFT processors. Furthermore, in FFT processors, several modules of ROM are required for the storage of twiddle factors. By exploiting the redundancy of the factors, the overall ROM size can be effectively reduced by a factor of 2.

Journal ArticleDOI
Lei Xue1, C.C. Liu1, Hong-Seung Kim1, S.K. Kim1, Sandip Tiwari1 
TL;DR: In this paper, a low-thermal-budget 3-D fabrication technique, multilayers with buried structures (MLBS), was proposed for mixed-signal integration.
Abstract: Three-dimensional (3-D) integration provides opportunities in large-scale integration of mixed-signal and general system-on-chip applications with improved performance, through increased density and mixing of different active and passive technologies. This paper reports a novel low-thermal-budget 3-D fabrication technique-multilayers with buried structures (MLBS) and an analysis of its applicability to mixed-signal integration. The MLBS technique uses a low temperature of 450/spl deg/C to transfer a single-crystal silicon layer over a processed wafer consisting of buried in-plane and out-of-plane interconnects obtained through a dual Damascene process. Devices can continue to be processed on this transferred layer. Electrical characteristics of MOS capacitors (D/sub it/=4.7/spl times/10/sup 10/ cm/sup -2/ eV/sup -1/) and 3-D integrated planar CMOS transistors (3-D CMOS), fabricated using MLBS, are consistent with integration requirements. Our analog analysis includes an investigation of thermal effects important to analog applications with continuous operation of transistors in forward active bias, as well as of the coupling isolation derived from use of a ground-plane. Use of high density local interconnectivity improves the thermal properties of 3-D CMOS over that of silicon-on-insulator, and use of a ground plane is shown to lead to an improvement of better than 8 dB in coupling isolation.

Journal ArticleDOI
TL;DR: In this article, a planar self-aligned double-gate MOSFET was implemented, where a unique sidewall source/drain structure (S/D) permits selfaligned patterning of the back-gate layer after the S/D structure is in place.
Abstract: A planar self-aligned double-gate MOSFET process has been implemented where a unique sidewall source/drain structure (S/D) permits self-aligned patterning of the back-gate layer after the S/D structure is in place. This allows coupling the silicon thickness control inherent in a planar, unpatterned layer with VLSI self-alignment techniques and also gives independently controlled front and back gates. The demanding structure led to process innovations primarily in front-end CMP, where planarity within 5 nm was achieved on an 8-in diameter wafer as well as in silicided silicon source/drain sidewalls, with minimal encroachment of the silicide. Double-gate FET (DGFET) operation is demonstrated, with good transport at both interfaces. Dense circuit layouts are achieved with multifinger devices, and logic inverters with back-gate-controlled load current as well as NOR logic using the two gates of a single transistor as inputs are demonstrated.

Journal ArticleDOI
TL;DR: This paper develops and presents an efficient transient thermal-simulation algorithm based on the alternating-direction-implicit (ADI) method, which is not only orders of magnitude faster than the traditional thermal-Simulation algorithms, but also highly accurate and efficient in memory usage.
Abstract: Due to the dramatic increase of clock frequency and integration density, power density and on-chip temperature in high-end very large scale integration (VLSI) circuits rise significantly. To ensure the timing correctness and the reliability of high-end VLSI design, efficient and accurate chip-level transient thermal simulations are of crucial importance. In this paper, we develop and present an efficient transient thermal-simulation algorithm based on the alternating-direction-implicit (ADI) method. Our algorithm, thermal-ADI, not only has a linear run time and memory requirement , but is also unconditionally stable, which ensures that time step is not limited by any stability requirement. Extensive experimental results show that our algorithm is not only orders of magnitude faster than the traditional thermal-simulation algorithms, but also highly accurate and efficient in memory usage.

Journal ArticleDOI
TL;DR: In this paper, a general-purpose low-voltage rail-to-rail input stage suitable for analog and mixed-signal applications was proposed, which is compatible with deep submicrometer CMOS devices, where the familiar voltage to current square law in saturation is not completely satisfied.
Abstract: This paper introduces a general-purpose low-voltage rail-to-rail input stage suitable for analog and mixed-signal applications. The proposed circuit provides, simultaneously, constant small-signal and large-signal behaviors over the entire input common-mode voltage range, while imposing no appreciable constraint for high-frequency operation. In addition, the accuracy of the circuit does not rely on any strict matching of the devices, unlike most of the traditional approaches based on complementary input pairs, which need to compensate for the difference in mobility between electrons and holes with the transistor aspect ratios. Also, the technique is compatible with deep submicrometer CMOS devices, where the familiar voltage-to-current square law in saturation is not completely satisfied. Based on the proposed input stage, a transconductor with rail-to-rail input common-mode range and an input/output rail-to-rail operational amplifier were developed. Both cells were designed to operate with a 3-V single supply and fabricated in standard 0.8-/spl mu/m CMOS technology. Experimental results are provided.

Proceedings ArticleDOI
27 Apr 2003
TL;DR: It is shown that any test set that detects all single stuck-at faults in a reversible circuit also detects all multiple stuck- at faults, and a practical test-set generation algorithm is given, based on an integer linear programming formulation, that yields test sets approximately half the size of those produced by conventional automatic test pattern generation.
Abstract: Irreversible computation necessarily results in energy dissipation due to information loss. While small in comparison to the power consumption of today's VLSI circuits, if current trends continue this will be a critical issue in the near future. Reversible circuits offer an alternative that, in principle, allows computation with arbitrarily small energy dissipation. Furthermore, reversible circuits are essential components of quantum logic. We consider the problem of testing these circuits, and in particular generating efficient test sets. The reversibility property significantly simplifies the problem, which is generally hard for the irreversible case. We discuss conditions for a test set to be complete, give a number of practical constructions, and consider test sets for worst-case circuits. In addition, we formulate the problem of finding minimal test sets into an integer linear program (ILP) with binary variables. While this ILP method is infeasible for large circuits, we show that combining it with a circuit decomposition approach yields a practical alternative.

Proceedings ArticleDOI
23 Mar 2003
TL;DR: In this article, a high-index contrast material system, micrometer sized building blocks, and IC fabrication techniques are leveraged in order to realize commercial VLSI photonic circuits.
Abstract: A new high-index contrast material system, micrometer sized building blocks, and IC fabrication techniques, are leveraged in order to realize commercial VLSI photonic circuits.

Proceedings ArticleDOI
27 Dec 2003
TL;DR: An efficient hardware implementation of the RC4 stream-cipher integrates in the same hardware module an 8-bit up to 128-bit key length capability and achieves a data throughput up to 22 Mbytes/sec in a maximum frequency of 64 MHz.
Abstract: In this paper, an efficient hardware implementation of the RC4 stream-cipher is proposed. In contrary to previous designs, which support only fixed length key, the proposed implementation integrates in the same hardware module an 8-bit up to 128-bit key length capability. Independently of the key length, the proposed VLSI implementation achieves a data throughput up to 22 Mbytes/sec in a maximum frequency of 64 MHz. The whole design was captured by using VHDL language and a FPGA device was used for the hardware implementation of the architecture. A detailed analysis, in terms of performance, and covered area is shown.

Proceedings ArticleDOI
15 Jun 2003
TL;DR: In this article, the authors compare the energy-delay trade-offs of high-performance 32-bit and 64-bit processor adders in 0.13/spl µ/m and 0.10/spl mu/m CMOS technologies, with an accuracy of 8% in delay estimates and 20% in energy estimates, compared with simulated data.
Abstract: We motivate the concept of comparing VLSI adders based on their energy-delay trade-offs and present a technique for estimating the energy-delay space of various high-performance VLSI adder topologies. Further, we show that our estimates accurately represent tradeoffs in the energy-delay space for high-performance 32-bit and 64-bit processor adders in 0.13/spl mu/m and 0.10/spl mu/m CMOS technologies, with an accuracy of 8% in delay estimates and 20% in energy estimates, compared with simulated data.

Proceedings ArticleDOI
06 Apr 2003
TL;DR: A floating-point to fixed-point conversion (FFC) methodology for digital VLSI signal processing systems is proposed based on a statistical approach and global optimization which allows a high degree of automation.
Abstract: We propose a floating-point to fixed-point conversion (FFC) methodology for digital VLSI signal processing systems. The past techniques used to facilitate FFC are first reviewed, followed by a description based on a statistical approach and global optimization which allows a high degree of automation.

Journal ArticleDOI
09 Feb 2003
TL;DR: A novel nonvolatile logic style, called complementary ferroelectric-capacitor (CFC) logic, is proposed for low-power logic-in-memory VLSI, in which storage elements are distributed over the logic-circuit plane.
Abstract: A novel nonvolatile logic style, called complementary ferroelectric-capacitor (CFC) logic, is proposed for low-power logic-in-memory VLSI, in which storage elements are distributed over the logic-circuit plane. Standby currents in distributed storage elements can be cut off by using ferroelectric-based nonvolatile storage elements, and the standby power dissipation can be greatly reduced. Since the nonvolatile storage and the switching functions are merged into ferroelectric capacitors by the capacitive coupling effect, reduction of active device counts can be achieved. The use of complementary stored data in coupled ferroelectric capacitors makes it possible to perform a switching operation with small degradation of the nonvolatile charge at a low supply voltage. The restore operation can be performed by only applying the small bias across the ferroelectric capacitor, which reduces the dynamic power dissipation. Applying the proposed circuitry in a fully parallel 32-bit content-addressable memory results in about 2/3 dynamic power reduction and 1/7700 static power reduction with chip size of 1/3, compared to a CMOS implementation using 0.6-/spl mu/m ferroelectric/CMOS.