Showing papers in "IEEE Transactions on Very Large Scale Integration Systems in 2010"

PDF

Open Access

Journal Article•DOI•

The Impact of NBTI Effect on Combinational Circuit: Modeling, Simulation, and Analysis

[...]

Wenping Wang¹, Shengqi Yang², Sarvesh Bhardwaj³, Sarma Vrudhula¹, Frank Liu⁴, Yu Cao¹ - Show less +2 more•Institutions (4)

Arizona State University¹, Shanghai University², Synopsys³, IBM⁴

01 Feb 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper develops a hierarchical framework for analyzing the impact of NBTI on the performance of logic circuits under various operation conditions, such as the supply voltage, temperature, and node switching activity, and proposes an efficient method to predict the degradation of circuit speed over a long period of time.

...read moreread less

Abstract: Negative-bias-temperature instability (NBTI) has become the primary limiting factor of circuit life time. In this paper, we develop a hierarchical framework for analyzing the impact of NBTI on the performance of logic circuits under various operation conditions, such as the supply voltage, temperature, and node switching activity. Given a circuit topology and input switching activity, we propose an efficient method to predict the degradation of circuit speed over a long period of time. The effectiveness of our method is comprehensively demonstrated with the International Symposium on Circuits and Systems (ISCAS) benchmarks and a 65-nm industrial design. Furthermore, we extract the following key design insights for reliable circuit design under NBTI effect, including: 1) During dynamic operation, NBTI-induced degradation is relatively insensitive to supply voltage, but strongly dependent on temperature; 2) There is an optimum supply voltage that leads to the minimum of circuit performance degradation; circuit degradation rate actually goes up if supply voltage is lower than the optimum value; 3) Circuit performance degradation due to NBTI is highly sensitive to input vectors. The difference in delay degradation is up to 5× for various static and dynamic operations. Finally, we examine the interaction between NBTI effect, and process and design uncertainty in realistic conditions.

...read moreread less

297 citations

Journal Article•DOI•

Design of Low-Power High-Speed Truncation-Error-Tolerant Adder and Its Application in Digital Signal Processing

[...]

Ning Zhu¹, Wang Ling Goh¹, Weija Zhang¹, Kiat Seng Yeo¹, Zhi Hui Kong¹ - Show less +1 more•Institutions (1)

Nanyang Technological University¹

01 Aug 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel error-tolerant adder (ETA) is proposed that is able to ease the strict restriction on accuracy, and at the same time achieve tremendous improvements in both the power consumption and speed performance.

...read moreread less

Abstract: In modern VLSI technology, the occurrence of all kinds of errors has become inevitable. By adopting an emerging concept in VLSI design and test, error tolerance (ET), a novel error-tolerant adder (ETA) is proposed. The ETA is able to ease the strict restriction on accuracy, and at the same time achieve tremendous improvements in both the power consumption and speed performance. When compared to its conventional counterparts, the proposed ETA is able to attain more than 65% improvement in the Power-Delay Product (PDP). One important potential application of the proposed ETA is in digital signal processing systems that can tolerate certain amount of errors.

...read moreread less

286 citations

Journal Article•DOI•

Understanding the Effect of Process Variations on the Delay of Static and Domino Logic

[...]

Massimo Alioto, Gaetano Palumbo, M. Pennisi

01 May 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Analysis shows that domino logic circuits suffer from a doubled variability as compared to the static CMOS logic style, which adds to the well-known speed degradation due to the current contention associated with the keeper transistor.

...read moreread less

Abstract: In this paper, the effect of process variations on delay is analyzed in depth for both static and dynamic CMOS logic styles. Analysis allows for gaining an insight into the delay dependence on fan-in, fan-out, and sizing in sub-100-nm technologies. Simple but reasonably accurate models are derived to capture the basic dependences. The effect of process variations in transistor stacks is analytically modeled and analyzed in detail. The impact of both interdie and intradie variations is evaluated and discussed. Interestingly, the input capacitance of static and dynamic logic is shown to be rather insensitive to variations. The delay variability was also shown to be a weak function of the input rise/fall time and load. Analysis shows that domino logic circuits suffer from a doubled variability as compared to the static CMOS logic style. The positive feedback associated with the keeper transistor is shown to be responsible for the variability increase, which, in turn, limits the speed performance. This adds to the well-known speed degradation due to the current contention associated with the keeper transistor. Monte Carlo simulations on a 90-nm technology, including layout parasitics, are performed to validate the results.

...read moreread less

183 citations

Journal Article•DOI•

C-Pack: A High-Performance Microprocessor Cache Compression Algorithm

[...]

Xi Chen¹, Lei Yang², Robert P. Dick¹, Li Shang³, Haris Lekatsas⁴ - Show less +1 more•Institutions (4)

University of Michigan¹, Google², University of Colorado Boulder³, Princeton University⁴

01 Aug 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This work presents a lossless compression algorithm that has been designed for fast on-line data compression, and cache compression in particular, and reduces the proposed algorithm to a register transfer level hardware design, permitting performance, power consumption, and area estimation.

...read moreread less

Abstract: Microprocessor designers have been torn between tight constraints on the amount of on-chip cache memory and the high latency of off-chip memory, such as dynamic random access memory. Accessing off-chip memory generally takes an order of magnitude more time than accessing on-chip cache, and two orders of magnitude more time than executing an instruction. Computer systems and microarchitecture researchers have proposed using hardware data compression units within the memory hierarchies of microprocessors in order to improve performance, energy efficiency, and functionality. However, most past work, and all work on cache compression, has made unsubstantiated assumptions about the performance, power consumption, and area overheads of the proposed compression algorithms and hardware. It is not possible to determine whether compression at levels of the memory hierarchy closest to the processor is beneficial without understanding its costs. Furthermore, as we show in this paper, raw compression ratio is not always the most important metric. In this work, we present a lossless compression algorithm that has been designed for fast on-line data compression, and cache compression in particular. The algorithm has a number of novel features tailored for this application, including combining pairs of compressed lines into one cache line and allowing parallel compression of multiple words while using a single dictionary and without degradation in compression ratio. We reduced the proposed algorithm to a register transfer level hardware design, permitting performance, power consumption, and area estimation. Experiments comparing our work to previous work are described.

...read moreread less

161 citations

Journal Article•DOI•

Yield-Driven Near-Threshold SRAM Design

[...]

Gregory K. Chen¹, Dennis Sylvester¹, David Blaauw¹, Trevor Mudge¹•Institutions (1)

University of Michigan¹

01 Nov 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The supply voltage for a minimum total energy operation (VMIN) based on activity factor is found that it is significantly higher for SRAM than for logic, and SRAM robustness is calculated using importance sampling, resulting in a seven-order run-time improvement over Monte Carlo sampling.

...read moreread less

Abstract: Voltage scaling is desirable in static RAM (SRAM) to reduce energy consumption. However, commercial SRAM is susceptible to functional failures when VDD is scaled down. Although several published SRAM designs scale VDD to 200-300 mV, these designs do not sufficiently consider SRAM robustness, limiting them to small arrays because of yield constraints, and may not correctly target the minimum energy operation point. We examine the effects on area and energy for the differential 6T and 8T bit cells as VDD is scaled down, and the bit cells are either sized and doped, or assisted appropriately to maintain the same yield as with full VDD. SRAM robustness is calculated using importance sampling, resulting in a seven-order run-time improvement over Monte Carlo sampling. Scaling 6T and 8T SRAM VDD down to 500 mV and scaling 8T SRAM to 300 mV results in a 50% and 83% dynamic energy reduction, respectively, with no reduction in robustness and low area overhead, but increased leakage per bit. Using this information, we calculate the supply voltage for a minimum total energy operation (VMIN) based on activity factor and find that it is significantly higher for SRAM than for logic.

...read moreread less

136 citations

Journal Article•DOI•

Design Paradigm for Robust Spin-Torque Transfer Magnetic RAM (STT MRAM) From Circuit/Architecture Perspective

[...]

Jing Li¹, Patrick Ndai¹, Ashish Goel¹, Sayeef Salahuddin², Kaushik Roy¹ - Show less +1 more•Institutions (2)

Purdue University¹, University of California, Berkeley²

01 Dec 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper analyzed and modeled the failure probabilities of STT MRAM cells due to parameter variations and developed an efficient design paradigm from circuit and/or architecture perspective-to improve the robustness and integration density.

...read moreread less

Abstract: Spin-torque transfer magnetic RAM (STT MRAM) is a promising candidate for future embedded applications. It combines the desirable attributes of current memory technologies such as SRAM, DRAM, and flash memories (fast access time, low cost, high density, and non-volatility). It also solves the critical drawbacks of conventional MRAM technology: poor scalability and high write current. However, variations in process parameters can lead to a large number of cells to fail, severely affecting the yield of the memory array. In this paper, we analyzed and modeled the failure probabilities of STT MRAM cells due to parameter variations. Based on the model, we performed a thorough analysis of the impact of design parameters on parametric failures due to process variations. To achieve high memory yield without incurring expensive technology modification, we developed an efficient design paradigm from circuit and/or architecture perspective-to improve the robustness and integration density. The proposed technique effectively relaxes or completely decouples the conflicting design requirements for read stability, writability and cell area. It can be used at an early stage of the design cycle for yield enhancement.

...read moreread less

131 citations

Journal Article•DOI•

A Sensitivity Analysis of Power Signal Methods for Detecting Hardware Trojans Under Real Process and Environmental Conditions

[...]

Reza M. Rad¹, Jim Plusquellic², M. Tehranipoor³•Institutions (3)

University of Maryland, Baltimore¹, University of New Mexico², University of Connecticut³

01 Dec 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper investigates the sensitivity of a power supply transient signal analysis method for detecting Trojans and focuses on determining the smallest detectable Trojan, i.e., the least number of gates a Trojan may have and still be detected, using a set of process simulation models that characterize a TSMC 0.18 μm process.

...read moreread less

Abstract: Trust in reference to integrated circuits addresses the concern that the design and/or fabrication of the integrated circuit (IC) may be purposely altered by an adversary. The insertion of a hardware Trojan involves a deliberate and malicious change to an IC that adds or removes functionality or reduces its reliability. Trojans are designed to disable and/or destroy the IC at some future time or they may serve to leak confidential information covertly to the adversary. Trojans can be cleverly hidden by the adversary to make it extremely difficult for chip validation processes, such as manufacturing test, to accidentally discover them. This paper investigates the sensitivity of a power supply transient signal analysis method for detecting Trojans. In particular, we focus on determining the smallest detectable Trojan, i.e., the least number of gates a Trojan may have and still be detected, using a set of process simulation models that characterize a TSMC 0.18 μm process. We also evaluate the sensitivity of our Trojan detection method in the presence of measurement noise and background switching activity.

...read moreread less

124 citations

Journal Article•DOI•

An On-Chip NBTI Sensor for Measuring pMOS Threshold Voltage Degradation

[...]

John Keane¹, Tae-Hyoung Kim¹, Chris H. Kim¹•Institutions (1)

University of Minnesota¹

01 Jun 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This work describes an on-chip NBTI degradation sensor using a delay-locked loop (DLL), in which the increase in pMOS threshold voltage due to NBTi stress is translated into a control voltage shift in the DLL for high sensing gain.

...read moreread less

Abstract: Negative bias temperature instability (NBTI) is one of the most critical device reliability issues in sub-130 nm CMOS processes. In order to better understand the characteristics of this mechanism, accurate and efficient means of measuring its effects must be explored. In this work, we describe an on-chip NBTI degradation sensor using a delay-locked loop (DLL), in which the increase in pMOS threshold voltage due to NBTI stress is translated into a control voltage shift in the DLL for high sensing gain. The proposed sensor is capable of supporting both DC and AC stress modes. Measurements from a test chip fabricated in a 130 nm bulk CMOS process show an average gain of 10 in the operating range of interest, with measurement times in tens of microseconds possible for minimal unwanted threshold voltage recovery. NBTI degradation readings across a range of operating conditions are presented to demonstrate the flexibility of this system.

...read moreread less

102 citations

Journal Article•DOI•

Dynamic Bit-Width Adaptation in DCT: An Approach to Trade Off Image Quality and Computation Energy

[...]

Jongsun Park¹, Jung Hwan Choi², Kaushik Roy³•Institutions (3)

Korea University¹, Samsung², Purdue University³

01 May 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A dynamic bit-width adaptation scheme for applications using discrete cosine transform (DCT) that can efficiently trade off image quality and computation energy and a bit- width selection algorithm to select the appropriate operand bit- Widths.

...read moreread less

Abstract: This paper presents a dynamic bit-width adaptation scheme for applications using discrete cosine transform (DCT). The technique can efficiently trade off image quality and computation energy. Based on sensitivity differences of 64 DCT coefficients, separate operand bit-widths are used for different frequency components to reduce computation energy. To select the appropriate operand bit-widths that achieve significant reduction of power consumption with minimum image quality degradation, we also propose a bit-width selection algorithm. The proposed variable bit precision DCT algorithm can be efficiently implemented using carry save adder trees. The reconfigurable DCT architecture can achieve power savings ranging from 36% to 75% compared to normal operation at the expense of minor image quality degradation.

...read moreread less

95 citations

Journal Article•DOI•

VLSI Implementation of BCH Error Correction for Multilevel Cell NAND Flash Memory

[...]

Hyojin Choi¹, Wei Liu¹, Wonyong Sung¹•Institutions (1)

Seoul National University¹

01 May 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Three error-correcting architectures, named as whole-page, sector-pipelined, and multistrip ones, are proposed and the VLSI design applies both algorithmic and architectural-level optimizations that include parallel algorithm transformation, resource sharing, and time multiplexing.

...read moreread less

Abstract: Bit-error correction is crucial for realizing cost-effective and reliable NAND Flash-memory-based storage systems. In this paper, low-power and high-throughput error-correction circuits have been developed for multilevel cell (MLC) nand Flash memories. The developed circuits employ the Bose-Chaudhuri-Hocquenghem code to correct multiple random bit errors. The error-correcting codes for them are designed based on the bit-error characteristics of MLC NAND Flash memories for solid-state drives. To trade the code rate, circuit complexity, and power consumption, three error-correcting architectures, named as whole-page, sector-pipelined, and multistrip ones, are proposed. The VLSI design applies both algorithmic and architectural-level optimizations that include parallel algorithm transformation, resource sharing, and time multiplexing. The chip area, power consumption, and throughput results for these three architectures are presented.

...read moreread less

94 citations

Journal Article•DOI•

Dynamic and Leakage Energy Minimization With Soft Real-Time Loop Scheduling and Voltage Assignment

[...]

Meikang Qiu¹, Laurence T. Yang², Zili Shao, Edwin H.-M. Sha³•Institutions (3)

University of New Orleans¹, St. Francis Xavier University², University of Texas at Dallas³

01 Mar 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An optimal soft real-time loop scheduling and voltage assignment algorithm, loop schedulingand voltage assignment to minimize energy, to minimize both dynamic and leakage energy via DVS and ABB is proposed.

...read moreread less

Abstract: With the shrinking of technology feature sizes, the share of leakage in total power consumption of digital systems continues to grow. Traditional dynamic voltage scaling (DVS) fails to accurately address the impact of scaling on system power consumption as the leakage power increases exponentially. The combination of DVS and adaptive body biasing (ABB) is an effective technique to jointly optimize dynamic and leakage energy dissipation. In this paper, we propose an optimal soft real-time loop scheduling and voltage assignment algorithm, loop scheduling and voltage assignment to minimize energy, to minimize both dynamic and leakage energy via DVS and ABB. Voltage transition overhead has been considered in our approach. We conduct simulations on a set of digital signal processor benchmarks based on the power model of 70 nm technology. The simulation results show that our approach achieves significant energy saving compared to that of the integer linear programming approach.

...read moreread less

Journal Article•DOI•

Design of Spin-Torque Transfer Magnetoresistive RAM and CAM/TCAM with High Sensing and Search Speed

[...]

Wei Xu¹, Tong Zhang¹, Yi Chen²•Institutions (2)

Rensselaer Polytechnic Institute¹, Seagate Technology²

01 Jan 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A new RAM cell structure design is proposed that can realize high speed and reliable sensing operations in the presence of relatively poor magnetoresistive ratio, while maintaining low sensing current through magnetic tunneling junctions (MTJs).

...read moreread less

Abstract: With a great scalability potential, nonvolatile magnetoresistive memory with spin-torque transfer (STT) programming has become a topic of great current interest. This paper addresses cell structure design for STT magnetoresistive RAM, content addressable memory (CAM) and ternary CAM (TCAM). We propose a new RAM cell structure design that can realize high speed and reliable sensing operations in the presence of relatively poor magnetoresistive ratio, while maintaining low sensing current through magnetic tunneling junctions (MTJs). We further apply the same basic design principle to develop new cell structures for nonvolatile CAM, and TCAM. The effectiveness of the proposed RAM, CAM and TCAM cell structures has been demonstrated by circuit simulation at 0.18 ?m CMOS technology.

...read moreread less

Journal Article•DOI•

Self-Adaptive System for Addressing Permanent Errors in On-Chip Interconnects

[...]

Teijo Lehtonen¹, David Wolpert², Pasi Liljeberg³, Juha Plosila⁴, Paul Ampadu² - Show less +1 more•Institutions (4)

Turku Centre for Computer Science¹, University of Rochester², Information Technology University³, Academy of Finland⁴

01 Apr 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A self-contained adaptive system for detecting and bypassing permanent errors in on-chip interconnects that reroutes data on erroneous links to a set of spare wires without interrupting the data flow is presented.

...read moreread less

Abstract: We present a self-contained adaptive system for detecting and bypassing permanent errors in on-chip interconnects. The proposed system reroutes data on erroneous links to a set of spare wires without interrupting the data flow. To detect permanent errors at runtime, a novel in-line test (ILT) method using spare wires and a test pattern generator is proposed. In addition, an improved syndrome storing-based detection (SSD) method is presented and compared to the ILT method. Each detection method (ILT and SSD) is integrated individually into the noninterrupting adaptive system, and a case study is performed to compare them with Hamming and Bose-Chaudhuri-Hocquenghem (BCH) code implementations. In the presence of permanent errors, the probability of correct transmission in the proposed systems is improved by up to 140% over the standalone Hamming code. Furthermore, our methods achieve up to 38% area, 64% energy, and 61% latency improvements over the BCH implementation at comparable error performance.

...read moreread less

Journal Article•DOI•

DSP-Driven Self-Tuning of RF Circuits for Process-Induced Performance Variability

[...]

Donghoon Han¹, Byung-Sung Kim², Abhijit Chatterjee³•Institutions (3)

Texas Instruments¹, Sungkyunkwan University², Georgia Institute of Technology³

01 Feb 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A postmanufacture self-tuning technique that aims to compensate for multiparameter variations is presented, which incorporates a response feature and hardware tuning knobs designed into the RF circuit and can be applied to other RF circuits as well.

...read moreread less

Abstract: In the deep-submicrometer design regime, RF circuits are expected to be increasingly susceptible to process variations, and thereby suffer from significant loss of parametric yield. To address this problem, a postmanufacture self-tuning technique that aims to compensate for multiparameter variations is presented. The proposed method incorporates a ?response feature? detector and ?hardware tuning knobs,? designed into the RF circuit. The RF device test response to a specially crafted diagnostic test stimulus is logged via the built-in detector and embedded analog-to-digital converter. Analysis and prediction of the optimal tuning knob control values for performance compensation is performed using software running on the baseband DSP processor. As a result, the RF circuit performance can be diagnosed and tuned with minimal assistance from external test equipment. Multiple RF performance parameters can be adjusted simultaneously under tuning knob control. The proposed concepts are illustrated for an RF low-noise amplifier (LNA) design and can be applied to other RF circuits as well. A simulation case study and hardware measurements on a fabricated 1.9-GHz LNAs show significant parametric yield enhancement (up to 58%) across the critical RF performance specifications of interest.

...read moreread less

Journal Article•DOI•

Design Margin Exploration of Spin-Transfer Torque RAM (STT-RAM) in Scaled Technologies

[...]

Yi Chen¹, Xiaobin Wang¹, Hai Li, Haiwen Xi¹, Yuan Yan², Wenzhong Zhu¹ - Show less +2 more•Institutions (2)

Seagate Technology¹, Hitachi²

01 Dec 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: By using the model, the scalability of STT-RAM technology down to a 22-nm Bulk-CMOS technology node is analyzed and the tradeoffs among the MTJ switching current, the thermal stability of theMTJ and the MOS transistor driving strength are discussed.

...read moreread less

Abstract: We propose a magnetic and electric level spin-transfer torque random access memory (STT-RAM) cell model to simulate the write operation of an STT-RAM. The model of a magnetic tunneling junction (MTJ) is modified to take into account the electrical response of the MOS transistor that is connected to the MTJ. A dynamic design flow is also proposed to minimize any unnecessary design margin in an STT-RAM cell design by leveraging from the new STT-RAM cell model. The design of an STT-RAM cell with a one-transistor-one-MTJ (1T1J) structure shows that our technique can reduce more than 22% of the STT-RAM cell area, compared with a conventional STT-RAM cell model at a TSMC 90-nm technology node. The performance and the reliability of the memory cell were unaffected. By using our model, we analyzed the scalability of STT-RAM technology down to a 22-nm Bulk-CMOS technology node. The tradeoffs among the MTJ switching current, the thermal stability of the MTJ and the MOS transistor driving strength are discussed. Some magnetic- and circuit-level solutions to achieve 9F2 STT-RAM cell area at 22-nm technology node are also discussed.

...read moreread less

Journal Article•DOI•

Robust Bioinspired Architecture for Optical-Flow Computation

[...]

Guillermo Botella¹, Antonio García², Manuel Rodríguez-Álvarez², Eduardo Ros², Uwe Meyer-Baese³, María Molina¹ - Show less +2 more•Institutions (3)

Complutense University of Madrid¹, University of Granada², Florida A&M University³

01 Apr 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel customizable architecture of a neuromorphic robust optical flow (multichannel gradient model) based on reconfigurable hardware with the properties of the cortical motion pathway is presented, thus obtaining a useful framework for building future complex bioinspired real-time systems with high computational complexity.

...read moreread less

Abstract: Motion estimation from image sequences, called optical flow, has been deeply analyzed by the scientific community. Despite the number of different models and algorithms, none of them covers all problems associated with real-world processing. This paper presents a novel customizable architecture of a neuromorphic robust optical flow (multichannel gradient model) based on reconfigurable hardware with the properties of the cortical motion pathway, thus obtaining a useful framework for building future complex bioinspired real-time systems with high computational complexity. The presented architecture is customizable and adaptable, while emulating several neuromorphic properties, such as the use of several information channels of small bit width, which is the nature of the brain. This paper includes the resource usage and performance data, as well as a comparison with other systems. This hardware platform has many application fields in difficult environments due to its bioinspired nature and robustness properties, and it can be used as starting point in more complex systems.

...read moreread less

Journal Article•DOI•

CMOS Bandgap References With Self-Biased Symmetrically Matched Current–Voltage Mirror and Extension of Sub-1-V Design

[...]

Yat-Hei Lam¹, Wing-Hung Ki¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Jun 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A series of bandgap references (BGRs) using a self-biased symmetrically matched current-voltage mirror (SM CVM) in reducing systematic offset, thus achieving an excellent line regulation, is presented.

...read moreread less

Abstract: A series of bandgap references (BGRs) using a self-biased symmetrically matched current-voltage mirror (SM CVM) in reducing systematic offset, thus achieving an excellent line regulation, is presented. By replacing the operational amplifier with a CVM in the feedback loop, current consumption is much reduced. An SM buffer stage that is capable of driving a resistive load with minor degradation in temperature coefficient (TC) and line regulation is also presented. The technique is extended to design a sub-1-V BGR with a TC-cancellation output buffer. All circuits are designed using a 0.35- CMOS process, and experimental results are presented, confirming the analysis.

...read moreread less

Journal Article•DOI•

Design of a CMOS Broadband Transimpedance Amplifier With Active Feedback

[...]

Zhenghao Lu¹, Kiat Seng Yeo¹, Wei Meng Lim¹, Manh Anh Do¹, Chirn Chye Boon¹ - Show less +1 more•Institutions (1)

Nanyang Technological University¹

01 Mar 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel current-mode transimpedance amplifier exploiting the common gate input stage with common source active feedback with low input impedance similar to that of the well-known regulated cascode (RGC) topology is realized in CHRT 0.8 V RFCMOS technology.

...read moreread less

Abstract: In this paper, a novel current-mode transimpedance amplifier (TIA) exploiting the common gate input stage with common source active feedback has been realized in CHRT 0.18 ?m -1.8 V RFCMOS technology. The proposed active feedback TIA input stage is able to achieve a low input impedance similar to that of the well-known regulated cascode (RGC) topology. The proposed TIA also employs series inductive peaking and capacitive degeneration techniques to enhance the bandwidth and the gain. The measured transimpedance gain is 54.6 dB? with a -3 dB bandwidth of about 7 GHz for a total input parasitic capacitance of 0.3 pF. The measured average input referred noise current spectral density is about 17.5 pA/?{Hz} up to 7 GHz. The measured group delay is within 65 ± 10 ps over the bandwidth of interest. The chip consumes 18.6 mW DC power from a single 1.8 V supply. The mathematical analysis of the proposed TIA is presented together with a detailed noise analysis based on the van der Ziel MOSFET noise model. The effect of the induced gate noise in a broadband TIA is included.

...read moreread less

Journal Article•DOI•

A Discussion on SRAM Circuit Design Trend in Deeper Nanometer-Scale Technologies

[...]

Hiroyuki Yamauchi¹•Institutions (1)

Fukuoka Institute of Technology¹

01 May 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: It has been shown that the 6T SRAM cell will be allowed long reign, even in the 15-nm process node, if ¿VT can be suppressed to < 70 mV thanks to effective oxide thickness scaling for the low-standby-power process; otherwise, 10T and 8T with read-modify-write will be needed after¿VT becomes > 85 and 75 mV, respectively.

...read moreread less

Abstract: This paper compares area scaling capabilities of many kinds of SRAM margin-assist solutions for VT variability issues, which are based on various efforts by not only the cell topology changes from 6T to 8T and 10T but also incorporation of multiple voltage supply for cell terminal biasing and timing sequence controls of read and write. The various SRAM solutions are analyzed in light of an impact on the required area overhead for each design solution given by ever-increasing VT random variation (?VT)> resulting in a slowdown in the SRAM scaling pace. In order to predict the area scaling trends among various SRAM solutions, two different ?VT-increasing scenarios of being pessimistic and optimistic are assumed, where o-vt becomes > 130 mV and suppressed to 85 and 75 mV, respectively.

...read moreread less

Journal Article•DOI•

Design and Implementation of a Sort-Free K-Best Sphere Decoder

[...]

S. Mondal¹, Ahmed M. Eltawil², Chung-An Shen², Khaled N. Salama³•Institutions (3)

Rensselaer Polytechnic Institute¹, University of California, Irvine², King Abdullah University of Science and Technology³

01 Oct 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel sort-free approach to path extension, as well as quantized metrics result in a high-throughput VLSI architecture with lower power and area consumption compared to state-of-the-art published systems.

...read moreread less

Abstract: This paper describes the design and very-large-scale integration (VLSI) architecture for a 4 × 4 breadth-first K-best multiple-input-multiple-output (MIMO) decoder using a 64 quadrature-amplitude modulation (QAM) scheme. A novel sort-free approach to path extension, as well as quantized metrics result in a high-throughput VLSI architecture with lower power and area consumption compared to state-of-the-art published systems. Functionality is confirmed via a field-programmable gate array (FPGA) implementation on a Xilinx Virtex II Pro FPGA. Comparison of simulation and measurements are given, and FPGA utilization figures are provided. Finally, VLSI architectural tradeoffs are explored for a synthesized application-specific IC (ASIC) implementation in a 65-nm CMOS technology.

...read moreread less

Journal Article•DOI•

Improving Multi-Level NAND Flash Memory Storage Reliability Using Concatenated BCH-TCM Coding

[...]

Shu Li¹, Tong Zhang¹•Institutions (1)

Rensselaer Polytechnic Institute¹

01 Oct 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A memory fault tolerance design solution geared to MLC NAND flash memories to concatenate trellis coded modulation (TCM) with an outer BCH code, which can greatly improve the error correction performance compared with the current design practice that uses BCH codes only.

...read moreread less

Abstract: By storing more than one bit in each memory cell, multi-level per cell (MLC) NAND flash memories are dominating global flash memory market due to their appealing storage density advantage. However, continuous technology scaling makes MLC NAND flash memories increasingly subject to worse raw storage reliability. This paper presents a memory fault tolerance design solution geared to MLC NAND flash memories. The basic idea is to concatenate trellis coded modulation (TCM) with an outer BCH code, which can greatly improve the error correction performance compared with the current design practice that uses BCH codes only. The key is that TCM can well leverage the multi-level storage characteristic to reduce the memory bit error rate and hence relieve the burden of outer BCH code, at no cost of extra redundant memory cells. The superior performance of such concatenated BCH-TCM coding systems for MLC NAND flash memories has been well demonstrated through computer simulations. A modified TCM demodulation approach is further proposed to improve the tolerance to static memory cell defects. We also address the associated practical implementation issues in case of using either single-page or multi-page programming strategy, and demonstrate the silicon implementation efficiency through application-specific integrated circuit design at 65 nm node.

...read moreread less

Journal Article•DOI•

Computation Error Analysis in Digital Signal Processing Systems With Overscaled Supply Voltage

[...]

Yang Liu¹, Tong Zhang¹, Keshab K. Parhi²•Institutions (2)

Rensselaer Polytechnic Institute¹, University of Minnesota²

01 Apr 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An analytical method to estimate the statistics of computer arithmetic computation errors due to supply voltage overscaling is presented and can be used to choose the appropriate computer arithmetic architecture in voltage-overscaled signal processing systems.

...read moreread less

Abstract: It has been recently demonstrated that digital signal processing systems may possibly leverage unconventional voltage overscaling (VOS) to reduce energy consumption while maintaining satisfactory signal processing performance. Due to the computation-intensive nature of most signal processing algorithms, the energy saving potential largely depends on the behavior of computer arithmetic units in response to overscaled supply voltage. This paper shows that different hardware implementations of the same computer arithmetic function may respond to VOS very differently and result in different energy saving potentials. Therefore, the selection of appropriate computer arithmetic architecture is an important issue in voltage-overscaled signal processing system design. This paper presents an analytical method to estimate the statistics of computer arithmetic computation errors due to supply voltage overscaling. Compared with computation-intensive circuit simulations, this analytical approach can be several orders of magnitude faster and can achieve a reasonable accuracy. This approach can be used to choose the appropriate computer arithmetic architecture in voltage-overscaled signal processing systems. Finally, we carry out case studies on a coordinate rotation digital computer processor and a finite-impulse-response filter to further demonstrate the importance of choosing proper computer arithmetic implementations.

...read moreread less

Journal Article•DOI•

A New VLSI Architecture of Parallel Multiplier–Accumulator Based on Radix-2 Modified Booth Algorithm

[...]

Young-Ho Seo¹, Dong-Wook Kim¹•Institutions (1)

Kwangwoon University¹

01 Feb 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency.

...read moreread less

Abstract: In this paper, we proposed a new architecture of multiplier-and-accumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposed CSA tree uses 1's-complement-based radix-2 modified Booth's algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The CSA propagates the carries to the least significant bits of the partial products and generates the least significant bits in advance to decrease the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits instead of the output of the final adder, which made it possible to optimize the pipeline scheme to improve the performance. The proposed architecture was synthesized with 250, 180 and 130 ?m, and 90 nm standard CMOS library. Based on the theoretical and experimental estimation, we analyzed the results such as the amount of hardware resources, delay, and pipelining scheme. We used Sakurai's alpha power law for the delay modeling. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.

...read moreread less

Journal Article•DOI•

SRAM Read/Write Margin Enhancements Using FinFETs

[...]

A. Carlson¹, Zheng Guo¹, Sriram Balasubramanian¹, Radu Zlatanovici¹, Tiehui Liu¹, Borivoje Nikolic¹ - Show less +2 more•Institutions (1)

University of California, Berkeley¹

01 Jun 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: It is shown that FinFET-based 6-T SRAM cells designed with pass-gate feedback (PGFB) achieve significant improvements in the cell read stability without area penalty, allowing for simultaneous read and write yield enhancements when the PGFB and PUWG designs are used in combination.

...read moreread less

Abstract: Process-induced variations and sub-threshold leakage in bulk-Si technology limit the scaling of SRAM into sub-32 nm nodes. New device architectures are being considered to improve control and reduce short channel effects. Among the likely candidates, FinFETs are the most attractive option because of their good scalability and possibilities for further SRAM performance and yield enhancement through independent gating. The enhancements to read/write margins and yield are investigated in detail for two cell designs employing independently gated FinFETs. It is shown that FinFET-based 6-T SRAM cells designed with pass-gate feedback (PGFB) achieve significant improvements in the cell read stability without area penalty. The write-ability of the cell can be improved through the use of pull-up write gating (PUWG) with a separate write word line (WWL). The benefits of these two approaches are complementary and additive, allowing for simultaneous read and write yield enhancements when the PGFB and PUWG designs are used in combination.

...read moreread less

Journal Article•DOI•

Single- and Multi-core Configurable AES Architectures for Flexible Security

[...]

Mao-Yin Wang¹, Chih-Pin Su¹, Chia-Lung Horng¹, Cheng-Wen Wu¹, Chih-Tsun Huang¹ - Show less +1 more•Institutions (1)

National Tsing Hua University¹

01 Apr 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The architecture performs encryption and decryption of large data with 128-b key in CBC mode using on-the-fly key generation and composite field S-box, making it more cost effective (with better thousand-gate/gigabit-per-second ratio) than conventional methods.

...read moreread less

Abstract: As networking technology advances, the gap between network bandwidth and network processing power widens. Information security issues add to the need for developing high-performance network processing hardware, particularly that for real-time processing of cryptographic algorithms. This paper presents a configurable architecture for Advanced Encryption Standard (AES) encryption, whose major building blocks are a group of AES processors. Each AES processor provides 219 block cipher schemes with a novel on-the-fly key expansion design for the original AES algorithm and an extended AES algorithm. In this multicore architecture, the memory controller of each AES processor is designed for the maximum overlapping between data transfer and encryption, reducing interrupt handling load of the host processor. This design can be applied to high-speed systems since its independent data paths greatly reduces the input/output bandwidth problem. A test chip has been fabricated for the AES architecture, using a standard 0.25-?m CMOS process. It has a silicon area of 6.29 mm2, containing about 200,500 logic gates, and runs at a 66-MHz clock. In electronic codebook (ECB) and cipher-block chaining (CBC) cipher modes, the throughput rates are 844.9, 704, and 603.4 Mb/s for 128-, 192-, and 256-b keys, respectively. In order to achieve 1-Gb/s throughput (including overhead) at the worst case, we design a multicore architecture containing three AES processors with 0.18-?m CMOS process. The throughput rate of the architecture is between 1.29 and 3.75 Gb/s at 102 MHz. The architecture performs encryption and decryption of large data with 128-b key in CBC mode using on-the-fly key generation and composite field S-box, making it more cost effective (with better thousand-gate/gigabit-per-second ratio) than conventional methods.

...read moreread less

Journal Article•DOI•

Leakage–Delay Tradeoff in FinFET Logic Circuits: A Comparative Analysis With Bulk Technology

[...]

Matteo Agostinelli¹, Massimo Alioto², David Esseni³, Luca Selmi³•Institutions (3)

Alpen-Adria-Universität Klagenfurt¹, University of Siena², University of Udine³

01 Feb 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The results show that, thanks to a larger threshold voltage sensitivity to back biasing, the FinFET technology is able to offer a more favorable compromise between standby power consumption and dynamic performance and is well suited for implementing fast and energy-efficient adaptive back-biasing strategies.

...read moreread less

Abstract: In this paper, we study the advantages offered by multi-gate fin FETs (FinFETs) over traditional bulk MOSFETs when low standby power circuit techniques are implemented. More precisely, we simulated various vehicle circuits, ranging from ring oscillators to mirror full adders, to investigate the effectiveness of back biasing and transistor-stacking in both FinFETs and bulk MOSFETs. The opportunity to separate the gates of FinFETs and to operate them independently has been systematically analyzed; mixed connected- and independent-gate circuits have also been evaluated. The study spans over the device, the layout, and the circuit level of abstraction and appropriate figures of merit are introduced to quantify the potential advantage of different schemes. Our results show that, thanks to a larger threshold voltage sensitivity to back biasing, the FinFET technology is able to offer a more favorable compromise between standby power consumption and dynamic performance and is well suited for implementing fast and energy-efficient adaptive back-biasing strategies.

...read moreread less

Journal Article•DOI•

A Novel Variation-Tolerant Keeper Architecture for High-Performance Low-Power Wide Fan-In Dynamic or Gates

[...]

Hamed F. Dadgour¹, Kaustav Banerjee¹•Institutions (1)

University of California, Santa Barbara¹

01 Nov 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel variation-tolerant keeper architecture is proposed, which is capable of significantly reducing contention and improving performance and power consumption and exhibits the lowest delay deviation under different levels of process variations.

...read moreread less

Abstract: Dynamic gates have been excellent choice in the design of high-performance modules in modern microprocessors. The only limitation of dynamic gates is their relatively low noise margin compared to that of standard CMOS gates. Traditionally, this issue has been resolved by employing a pMOS keeper circuit that compensates for leakage current of the pull-down nMOS network. In the earlier technology nodes, the keeper circuit could improve reliability of the dynamic gates with minor performance penalty. However, aggressive scaling trends of CMOS technology along with increasing levels of process variations have reduced effectiveness of the traditional keeper approach. This is because to maintain an acceptable noise margin level in deep sub-100 nm technologies, large pMOS keepers must be employed, which generates substantial contention between the keeper and the pull-down network, and hence results in severe loss of performance and high power consumption. This problem is more severe in wide fan-in dynamic gates due to the large number of leaky nMOS devices connected to the dynamic node. In this paper, a novel variation-tolerant keeper architecture is proposed, which is capable of significantly reducing contention and improving performance and power consumption. Using circuit simulations, the overall improved characteristics of the proposed keeper are demonstrated in comparison to those of the traditional as well as several state-of-the-art keepers. The proposed keeper exhibits the lowest delay deviation under different levels of process variations. Also, it is shown that for an eight-input or gate, in presence of 15% Vth fluctuations, the proposed architecture can lead to 20%, 15%, and more than 40% reduction in power consumption, mean delay, and standard deviation of delay, respectively, when compared to the traditional keeper circuit.

...read moreread less

Journal Article•DOI•

Performability/Energy Tradeoff in Error-Control Schemes for On-Chip Networks

[...]

Alireza Ejlali¹, Bashir M. Al-Hashimi², P. Rosinger², Seyed Ghassem Miremadi¹, Luca Benini³ - Show less +1 more•Institutions (3)

Sharif University of Technology¹, University of Southampton², University of Bologna³

01 Jan 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: It is argued in this paper that the use of error-control schemes in on-chip networks results in degradable systems, hence, performance and reliability must be measured jointly using a unified measure, i.e., performability.

...read moreread less

Abstract: High reliability against noise, high performance, and low energy consumption are key objectives in the design of on-chip networks. Recently some researchers have considered the impact of various error-control schemes on these objectives and on the tradeoff between them. In all these works performance and reliability are measured separately. However, we will argue in this paper that the use of error-control schemes in on-chip networks results in degradable systems, hence, performance and reliability must be measured jointly using a unified measure, i.e., performability. Based on the traditional concept of performability, we provide a definition for the ?Interconnect Performability?. Analytical models are developed for interconnect performability and expected energy consumption. A detailed comparative analysis of the error-control schemes using the performability analytical models and SPICE simulations is provided taking into consideration voltage swing variations (used to reduce interconnect energy consumption) and variations in wire length. Furthermore, the impact of noise power and time constraint on the effectiveness of error-control schemes are analyzed.

...read moreread less

Journal Article•DOI•

Test Data Compression Using Efficient Bitmask and Dictionary Selection Methods

[...]

Kanad Basu¹, Prabhat Mishra¹•Institutions (1)

University of Florida¹

01 Sep 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel test data compression technique using bitmasks which provides a substantial improvement in the compression efficiency without introducing any additional decompression penalty is proposed.

...read moreread less

Abstract: Higher circuit densities in system-on-chip (SOC) designs have led to drastic increase in test data volume. Larger test data size demands not only higher memory requirements, but also an increase in testing time. Test data compression addresses this problem by reducing the test data volume without affecting the overall system performance. This paper proposes a novel test data compression technique using bitmasks which provides a substantial improvement in the compression efficiency without introducing any additional decompression penalty. The major contributions of this paper are as follows: 1) it develops an efficient bitmask selection technique for test data in order to create maximum matching patterns; 2) it develops an efficient dictionary selection method which takes into account the bitmask based compression; and 3) it proposes a test compression technique using efficient dictionary and bitmask selection to significantly reduce the testing time and memory requirements. We have applied our method on various test data sets and compared our results with other existing test compression techniques. Our algorithm outperforms existing dictionary-based approaches by up to 30%, giving a best possible test compression of 92%.

...read moreread less

Journal Article•DOI•

On-Chip Variability Sensor Using Phase-Locked Loop for Detecting and Correcting Parametric Timing Failures

[...]

Kunhyuk Kang¹, Sang Phill Park², Keejong Kim³, Kaushik Roy²•Institutions (3)

Intel¹, Purdue University², Broadcom³

01 Feb 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The analysis shows that control voltage (V cnt) of voltage-controlled oscillator in PLL can be used as a dynamic performance signature of an operating IC and a variation-resilient system technique using adaptive body biasing (ABB) is proposed.

...read moreread less

Abstract: Performance variability in digital integrated circuits can largely affect parametric yield and product reliability in ultra deep submicrometer technologies. As a result, variation resilience is becoming an essential design requirement for future technology nodes, especially for timing critical applications. This paper proposes an on-chip variability sensor using phase-locked loop (PLL) to detect process, supply voltage (V DD), and temperature variations (process, voltage, and temperature variation) or even temporal reliability degradation stemming from negative bias temperature instability. Our analysis shows that control voltage (V cnt) of voltage-controlled oscillator in PLL can be used as a dynamic performance signature of an operating IC. Along with the proposed PLL-based sensor circuit, we also propose a variation-resilient system technique using adaptive body biasing (ABB). The PLL V cnt signal is efficiently transformed to an optimal body bias signal for various circuit blocks to avoid possible timing failures. Correspondingly, circuits can be designed with significantly relaxed timing constraint compared to conventional approaches, where a large amount of design resources can be wasted to take care of the worst-case situations. We demonstrated our approach on a test chip fabricated in IBM 130-nm CMOS technology. Measurement results show that the PLL-based sensor is cable of tracking various sources of circuit variations. Optimization analysis shows that 42% and 43% reduction in area and power can be obtained using our approach compared to the worst-case sizing. The proposed study refers to our previous study introduced in with major improvements in measurement results and analysis.

...read moreread less

Collapse