scispace - formally typeset
Search or ask a question
Author

Behnam Amelifard

Bio: Behnam Amelifard is an academic researcher from Qualcomm. The author has contributed to research in topics: Low-power electronics & CMOS. The author has an hindex of 12, co-authored 28 publications receiving 488 citations. Previous affiliations of Behnam Amelifard include University of Southern California & University of Tehran.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper presents a method based on dual- V t and dual- T ox assignment to reduce the total leakage power dissipation of static random access memories (SRAMs) while maintaining their performance.
Abstract: Aggressive CMOS scaling results in low threshold voltage and thin oxide thickness for transistors manufactured in deep submicrometer regime. As a result, reducing the subthreshold and tunneling gate leakage currents has become one of the most important criteria in the design of VLSI circuits. This paper presents a method based on dual- V t and dual- T ox assignment to reduce the total leakage power dissipation of static random access memories (SRAMs) while maintaining their performance. The proposed method is based on the observation that read and write delays of a memory cell in an SRAM block depend on the physical distance of the cell from the sense amplifier and the decoder. Thus, the idea is to deploy different configurations of six-transistor SRAM cells corresponding to different threshold voltage and oxide thickness assignments for the transistors. Unlike other techniques for low-leakage SRAM design, the proposed technique incurs neither area nor delay overhead. In addition, it results in a minor change in the SRAM design flow. The leakage saving achieved by using this technique is a function of the values of the high threshold voltage and the oxide thickness, as well as the number of rows and columns in the cell array. Simulation results with a 65-nm process demonstrate that this technique can reduce the total leakage power dissipation of a 64 times 512 SRAM array by 33% and that of a 32 times 512 SRAM array by 40%.

69 citations

Proceedings ArticleDOI
21 Mar 2005
TL;DR: A new design of a low-power high-performance adder is presented that is faster than a ripple carry adder (RCA), but slower than a CSA.
Abstract: Based on the idea of sharing two adders used in the carry select adder (CSA), a new design of a low-power high-performance adder is presented. The new adder is faster than a ripple carry adder (RCA), but slower than a CSA. On the other hand, its area and power dissipation are smaller than those of a CSA.

54 citations

Proceedings ArticleDOI
06 Mar 2006
TL;DR: Simulation results with a 65 nm process demonstrate that this technique can reduce the total leakage power dissipation of a 64 Kb SRAM by more than 50% and incurs neither area nor delay overhead.
Abstract: Aggressive CMOS scaling results in low threshold voltage and thin oxide thickness for transistors manufactured in very deep submicron regime. As a result, reducing the subthreshold and gate-tunneling leakage currents has become one of the most important criteria in the design of VLSI circuits. This paper presents a method based on dual-V t and dual-Tox assignment to reduce the total leakage power dissipation of SRAMs while maintaining their performance. The proposed method is based on the observation that the read and write delays of a memory cell in an SRAM block depend on the physical distance of the cell from the sense amplifier and the decoder. Thus, the idea is to deploy different types of six-transistor SRAM cells corresponding to different threshold voltage and oxide thickness assignments for the transistors. Unlike other techniques for low-leakage SRAM design, the proposed technique incurs neither area nor delay overhead. In addition, it results in a minor change in the SRAM design flow. Simulation results with a 65 nm process demonstrate that this technique can reduce the total leakage power dissipation of a 64 Kb SRAM by more than 50%

49 citations

Proceedings ArticleDOI
04 May 2008
TL;DR: First, it is shown that NBTI tightens the setup and hold timing constraints imposed on the flip-flops in the design, and an NBTi-aware transistor sizing technique can minimize the N BTI effect on timing characteristics of the flips.
Abstract: With the scaling down of the CMOS technologies, Negative Bias Temperature Instability (NBTI) has become a major concern due to its impact on PMOS transistor aging process and the corresponding reduction in the long-term reliability of CMOS circuits. This paper investigates the effect of NBTI phenomenon on the setup and hold times of flip-flops. First, it is shown that NBTI tightens the setup and hold timing constraints imposed on the flip-flops in the design. Second, different types of flip-flops exhibit different levels of susceptibility to NBTI-induced change in their setup/hold time values. Finally, an NBTI-aware transistor sizing technique can minimize the NBTI effect on timing characteristics of the flip-flops.

47 citations

Journal ArticleDOI
TL;DR: It is shown that, in a SoC design with static-voltage assignment, a multilevel tree topology of suitably chosen dc-dc converters between the power source and loads can result in higher power efficiency in the PDN.
Abstract: This paper introduces techniques for power-efficient design of power-delivery network (PDN) in multiple voltage-island system-on-chip (SoC) designs. The first technique is targeted to SoC designs with static-voltage assignment, while the second technique is pertinent to SoC designs with dynamic-voltage scaling (DVS) capability. Conventionally, a single-level configuration of dc-dc converters, where exactly one converter resides between the power source and each load, is used to deliver currents at appropriate voltage levels to different loads on the chip. In the presence of DVS capability, each dc-dc converter in this network should be able to adjust its output voltage. In the first part of this paper, it is shown that, in a SoC design with static-voltage assignment, a multilevel tree topology of suitably chosen dc-dc converters between the power source and loads can result in higher power efficiency in the PDN. The problem is formulated as a combinatorial problem and is efficiently solved by dynamic programming. In the second part of this paper, a new technique is presented to design the PDN for a SoC design to support DVS. In this technique, the PDN is composed of two layers. In the first layer, dc-dc converters with fixed output voltages are used to generate all voltage levels that are needed by different loads in the SoC design. In the second layer of the PDN, a power-switch network is used to dynamically connect the power-supply terminals of each load to the appropriate dc-dc converter output in the first layer. Experimental results demonstrate the efficacy of both techniques.

47 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A mathematical bit error rate (BER) model for upsets in memories protected by error-correcting codes (ECCs) and scrubbing is derived and is compared with expected upset rates for sub-100-nm SRAM memories in space environments.
Abstract: A mathematical bit error rate (BER) model for upsets in memories protected by error-correcting codes (ECCs) and scrubbing is derived. This model is compared with expected upset rates for sub-100-nm SRAM memories in space environments. Because sub-100-nm SRAM memory cells can be upset by a critical charge (Qcrit) of 1.1 fC or less, they may exhibit significantly higher upset rates than those reported in earlier technologies. Because of this, single-bit-correcting ECCs may become impractical due to memory scrubbing rate limitations. The overhead needed for protecting memories with a triple-bit-correcting ECC is examined relative to an approximate 2X ldquoprocess generationrdquo scaling penalty in area, speed, and power.

110 citations

Journal ArticleDOI
TL;DR: Impact of process parameters variations on various design metrics of the proposed cell are presented and compared with conventional differential 6T (D6T), transmission gate-based 8T (TG8T), and single ended8T (SE8T) SRAM cells.
Abstract: Low power and noise tolerant static random access memory (SRAM) cells are in high demand today. This paper presents a stable differential SRAM cell that consumes low power. The proposed cell has similar structure to conventional 6T SRAM cell with the addition of two buffer transistors, one tail transistor and one complementary word line. Due to stacking effect, the proposed cell achieves lower power dissipation. In this paper, impact of process parameters variations on various design metrics of the proposed cell are presented and compared with conventional differential 6T (D6T), transmission gate-based 8T (TG8T), and single ended 8T (SE8T) SRAM cells. Impact of process variation, like threshold voltage and length, on different design metrics of an SRAM cell like, read static noise margin (RSNM), read access time ( ${T_{\mathrm{RA}}}$ ), and write access time ( ${T_{\mathrm{ WA}}} $ ) are also presented. The proposed cell achieves ${1.12{\times } /{\mathrm{ 1}}.{\mathrm{ 43}}{\times } /{\mathrm{ 5}}.{\mathrm{ 62}}\times } $ improvement in ${T_{\mathrm {RA}}}$ compared to TG8T/D6T/SE8T at a penalty of $ {1.1{\times } /{4}.{88}\times }$ in $ {T_{\mathrm{ WA}}} $ compared to D6T/TG8T and $ {1.19{\times } /1.18\times } $ in read/write power consumption compared to D6T. An improvement of $ {\rm 1.{\mathrm{ 12}}{\times } /{\mathrm{ 2}}.{\mathrm{ 15}}\times } $ in RSNM is observed compared to D6T/TG8T. The proposed cell consumes $ {5.38\times } $ less power during hold mode and also shows ${2.33\times } $ narrower spread in hold power @ $ {V_{\mathrm{ DD}} = 0.{\mathrm{ 4}}}$ V compared to D6T SRAM cell.

110 citations

Proceedings ArticleDOI
13 May 2013
TL;DR: CSA is one of the fastest adders used in many data-processing systems to perform fast arithmetic operations and importance of BEC logic comes from the large silicon area reduction when designing MCSA for large number of bits.
Abstract: The paper describes the power and area efficient carry select adder (CSA). Firstly, CSA is one of the fastest adders used in many data-processing systems to perform fast arithmetic operations. Secondly, CSA is intermediate between small areas but longer delay Ripple Carry Adder (RCA) and a larger area with shorter delay carry look-ahead adder. Third, there is still scope to reduce area in CSA by introduction of some add-one scheme. In Modified Carry Select Adder (MCSA) design, single RCA and BEC are used instead of dual RCAs to reduce area and power consumption with small speed penalty. The reason for area reduction is that, the number of logic gates used to design a BEC is less than the number of logic gates used for a RCA design. Thus, importance of BEC logic comes from the large silicon area reduction when designing MCSA for large number of bits. MCSA architectures are designed for 8-bit, 16-bit, 32-bit and 64-bit respectively. The design has been synthesized at 90nm process technology targeting using Xilinx Spartan-3 device. Comparison results of modified CSA with conventional CSA show better results and improvements.

89 citations