Memory isolation is a key property of a reliable and secure computing system--an access to one memory address should not have unintended side effects on data stored in other addresses. However, as DRAM process technology scales down to smaller dimensions, it becomes more difficult to prevent DRAM cells from electrically interacting with each other. In this paper, we expose the vulnerability of commodity DRAM chips to disturbance errors. By reading from the same address in DRAM, we show that it is possible to corrupt data in nearby addresses. More specifically, activating the same row in DRAM corrupts data in nearby rows. We demonstrate this phenomenon on Intel and AMD systems using a malicious program that generates many DRAM accesses. We induce errors in most DRAM modules (110 out of 129) from three major DRAM manufacturers. From this we conclude that many deployed systems are likely to be at risk. We identify the root cause of disturbance errors as the repeated toggling of a DRAM row's wordline, which stresses inter-cell coupling effects that accelerate charge leakage from nearby rows. We provide an extensive characterization study of disturbance errors and their behavior using an FPGA-based testing platform. Among our key findings, we show that (i) it takes as few as 139K accesses to induce an error and (ii) up to one in every 1.7K cells is susceptible to errors. After examining various potential ways of addressing the problem, we propose a low-overhead solution to prevent the errors

/pdf/flipping-bits-in-memory-without-accessing-them-an-3dirvwezou.pdf

Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is even more difficult. Unfortunately, as a result of the high number of processors, the power consumption of many-core processors such as GPUs has increased significantly. Hence, in this paper, we propose an integrated power and performance (IPP) prediction model for a GPU architecture to predict the optimal number of active processors for a given application. The basic intuition is that when an application reaches the peak memory bandwidth, using more cores does not result in performance improvement. We develop an empirical power model for the GPU. Unlike most previous models, which require measured execution times, hardware performance counters, or architectural simulations, IPP predicts execution times to calculate dynamic power events. We then use the outcome of IPP to control the number of running cores. We also model the increases in power consumption that resulted from the increases in temperature. With the predicted optimal number of active cores, we show that we can save up to 22.09%of runtime GPU energy consumption and on average 10.99% of that for the five memory bandwidth-limited benchmarks.

https://www.cc.gatech.edu/~hyesoon/hong_isca10.pdf

An integrated GPU power and performance model

In this paper, we have analyzed and modeled failure probabilities (access-time failure, read/write failure, and hold failure) of synchronous random-access memory (SRAM) cells due to process-parameter variations. A method to predict the yield of a memory chip based on the cell-failure probability is proposed. A methodology to statistically design the SRAM cell and the memory organization is proposed using the failure-probability and the yield-prediction models. The developed design strategy statistically sizes different transistors of the SRAM cell and optimizes the number of redundant columns to be used in the SRAM array, to minimize the failure probability of a memory chip under area and leakage constraints. The developed method can be used in an early stage of a design cycle to enhance memory yield in nanometer regime.

Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS

A detailed theoretical picture is given for the physics of strain effects in bulk semiconductors and surface Si, Ge, and III–V channel metal-oxide-semiconductor field-effect transistors. For the technologically important in-plane biaxial and longitudinal uniaxial stress, changes in energy band splitting and warping, effective mass, and scattering are investigated by symmetry, tight-binding, and k⋅p methods. The results show both types of stress split the Si conduction band while only longitudinal uniaxial stress along ⟨110⟩ splits the Ge conduction band. The longitudinal uniaxial stress warps the conduction band in all semiconductors. The physics of the strain altered valence bands for Si, Ge, and III–V semiconductors are shown to be similar although the strain enhancement of hole mobility is largest for longitudinal uniaxial compression in ⟨110⟩ channel devices and channel materials with substantial differences between heavy and light hole masses such as Ge and GaAs. Furthermore, for all these materials,...

Physics of strain effects in semiconductors and metal-oxide-semiconductor field-effect transistors

SRAM cell read stability and write-ability are major concerns in nanometer CMOS technologies, due to the progressive increase in intra-die variability and Vdd scaling. This paper analyzes the read stability N-curve metrics and compares them with the commonly used static noise margin (SNM) metric defined by Seevinck. Additionally, new write-ability metrics derived from the same N-curve are introduced and compared with the traditional write-trip point definition. Analytical models of all these metrics are developed. It is demonstrated that the new metrics provide additional information in terms of current, which allows designing a more robust and stable cell. By taking into account this current information, Vdd scaling is no longer a limiting factor for the read stability of the cell. Finally, these metrics are used to investigate the impact of the intra-die variability on the stability of the cell by using a statistically-aware circuit optimization approach and the results are compared with the worst-case or corner-based design

/pdf/read-stability-and-write-ability-analysis-of-sram-cells-for-x816hx0q38.pdf

Read Stability and Write-Ability Analysis of SRAM Cells for Nanometer Technologies

High leakage current in deep-submicrometer regimes is becoming a significant contributor to power dissipation of CMOS circuits as threshold voltage, channel length, and gate oxide thickness are reduced. Consequently, the identification and modeling of different leakage components is very important for estimation and reduction of leakage power, especially for low-power applications. This paper reviews various transistor intrinsic leakage mechanisms, including weak inversion, drain-induced barrier lowering, gate-induced drain leakage, and gate oxide tunneling. Channel engineering techniques including retrograde well and halo doping are explained as means to manage short-channel effects for continuous scaling of CMOS devices. Finally, the paper explores different circuit techniques to reduce the leakage power consumption.

Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits

A leakage-tolerant design technique for high fan-in dynamic logic circuits is presented. An NMOS transistor with gate and drain terminals tied together (diode) is added in series with the evaluation network of standard domino circuits. Due to the stacking effect, the leakage of the evaluation path significantly decreases, thereby improving the robustness of the circuit against deep-submicron subthreshold leakage and input noise. To improve the speed of the circuit, a current mirror is also employed in the evaluation network to increase the evaluation current. The proposed technique (diode-footed domino) exhibits considerable improvement in leakage and noise immunity as compared to the standard domino circuits. Simulation results of wide fan-in gates designed using Berkeley Predictive Technology Models of 70-nm technology demonstrate at least 1.9/spl times/ noise-immunity improvement at the same delay compared to the standard domino circuits. Dynamic comparators and multiplexers are designed using the diode-footed domino and conventional techniques to demonstrate the effectiveness of the proposed scheme in improving leakage-tolerance and performance of high fan-in circuits.

http://online.sfsu.edu/mahmoodi/papers/paper_J5.pdf

Diode-footed domino: a leakage-tolerant high fan-in dynamic circuit design style

In nano-scaled CMOS circuits, the random dopant fluctuations cause significant threshold voltage (Vt) variations in transistors. In this paper, we propose a semi-analytical estimation methodology to predict the delay distribution (mean and standard deviation) of logic circuits considering Vt variation in transistors. The proposed method is fast and can be used to predict delay distribution in nano-scaled CMOS technologies both at the circuit and the device design phase.

http://online.sfsu.edu/mahmoodi/papers/paper_J9.pdf

Estimation of delay variations due to random-dopant fluctuations in nano-scaled CMOS circuits

This paper presents a programmable digital finite-impulse response (FIR) filter for high-performance and low-power applications. The architecture is based on a computation sharing multiplier (CSHM) which specifically targets computation re-use in vector-scalar products and can be effectively used in the low-complexity programmable FIR filter design. Efficient circuit-level techniques, namely a new carry-select adder and conditional capture flip-flop (CCFF), are also used to further improve power and performance. A 10-tap programmable FIR filter was implemented and fabricated in CMOS 0.25-/spl mu/m technology based on the proposed architectural and circuit-level techniques. The chip's core contains approximately 130 K transistors and occupies 9.93 mm/sup 2/ area.

http://online.sfsu.edu/mahmoodi/papers/paper_J4.pdf

Computation sharing programmable FIR filter for low-power and high-performance applications

In this paper we have analyzed and modeled the failure probabilities (access time failure, read/write stability failure, and hold stability failure in the stand-by mode) of SRAM cells due to process parameter variations. A method to predict the yield of a memory chip designed with a cell is proposed based on the cell failure probability. The developed method can be used in the early stage of a design cycle to optimize the design for yield enhancement.

http://userwww.sfsu.edu/mahmoodi/papers/paper_C14.pdf

H. Mahmoodi-Meimand

Papers

Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits

Diode-footed domino: a leakage-tolerant high fan-in dynamic circuit design style

Estimation of delay variations due to random-dopant fluctuations in nano-scaled CMOS circuits

Computation sharing programmable FIR filter for low-power and high-performance applications

Modeling and estimation of failure probability due to parameter variations in nano-scale SRAMs for yield enhancement