scispace - formally typeset
Open AccessProceedings ArticleDOI

Modeling the effect of technology trends on the soft error rate of combinational logic

Reads0
Chats0
TLDR
An end-to-end model is described and validated that enables us to compute the soft error rates (SER) for existing and future microprocessor-style designs and predicts that the SER per chip of logic circuits will increase nine orders of magnitude from 1992 to 2011 and at that point will be comparable to the SERper chip of unprotected memory elements.

Content maybe subject to copyright    Report

Modeling the Effect of Technology Trends on
Soft Error Rate of Combinational Logic
Premkishore Shivakumar Michael Kistler
Stephen W. Keckler Doug Burger Lorenzo Alvisi
Department of Computer Sciences
University of Texas at Austin
Austin, TX USA
pkishore,kistler,skeckler,dburger,lorenzo
@cs.utexas.edu
IBM Technical Contacts: John Keaty, Rob Bell, and Ram Rajamony
Abstract
This paper examines the effect of technologyscaling and mi-
croarchitectural trends on the rate of soft errors in CMOS
memory and logic circuits. We describe and validate an
end-to-end model that enables us to compute the soft er-
ror rates (SER) for existing and future microprocessor-style
designs. The model captures the effects of two impor-
tant masking phenomena, electrical masking and latching-
window masking, which inhibit soft errors in combinational
logic. We quantify the SER in combinational logic and
latches for feature sizes from 600nm to 50nm and clock rates
from 16 to 6 fan-out-of-4 delays. Our model predicts that
the SER per chip of logic circuits will increase eight or-
ders of magnitude by the year 2011 and at that point will
be comparable to the SER per chip of unprotected memory
elements. Our result emphasizes the need for computer sys-
tem designers to address the risks of SER in logic circuits in
future designs.
1 Introduction
Two important trends driving microprocessor perfor-
mance are scaling of device feature sizes and increasing
pipeline depths. In this paper we explore how these trends
affect the susceptibility of microprocessors to soft errors.
Device scaling is reduction in feature size and voltage lev-
els of the basic devices on the microprocessor. The ba-
sic motivation for device scaling is to improve processor
performance, since smaller devices require less current to
This author is also employed at the IBM Austin Research Laboratory.
turn on or off, and thus can be operated at higher frequen-
cies. Pipelining is a microarchitectural technique for im-
proving performance by increasing instruction level paral-
lelism (ILP). Pipelining is a well accepted and almost uni-
versally adopted technique in microprocessor design. Five
stage pipelines are quite common, and even processors with
six to eight pipeline stages are considered to be relatively
simple designs. Recent processors have aggressively ap-
plied the techniques of pipelining, with some current de-
signs using upwards of twenty stages [9]. Such designs are
commonly referred to as superpipelined designs.
Our study focuses on soft errors, which are also called
transient faults or single-event upsets (SEUs). These are
errors in processor execution that are not due to design or
manufacturing defects, but instead due to electrical noise or
external radiation. In particular, we are interested in soft
errors caused by cosmic rays. The existence of cosmic ray
radiation has been known for over 50 years, and the capac-
ity for this radiation to create transient faults in semicon-
ductor circuits has been studied since the early 1980s. As
a result, most modern microprocessors already incorporate
mechanisms for detecting soft errors. These mechanisms
are typically focused on protecting memory elements, par-
ticularly caches, using error-correcting codes (ECC), par-
ity, and other techniques. Two key reasons for this focus
on memory elements are: 1) the techniques for protect-
ing memory elements are well understood and relatively in-
expensive in terms of the extra circuitry required, and 2)
caches take up a large part, and in some cases a majority, of
the chip area in modern microprocessors.
Past research has shown that combinational logic is
much less susceptible to soft errors than memory elements.
This is because three phenomena provide combinational

logic a form of natural resistance to soft errors: 1) logi-
cal masking, 2) electrical masking, and 3) latching-window
masking. We develop models for electrical masking and
latching-window masking to determine how these are af-
fected by device scaling and superpipelining. Then based
on a composite model we estimate the effects of these tech-
nology trends on the soft error rate (SER) of combinational
logic. Finally using an overall chip area model we com-
pare the SER/chip of combinational logic with the expected
trends in SER of memory elements.
The primary contribution of our work is an analysis of
the trends in SER for SRAM cells, latches, and combina-
tional logic. Our models predict that by 2011 the soft error
rate in combinational logic will be comparable to that of
unprotected memory elements. This is extremely signifi-
cant because current methods for protecting combinational
logic have significant costs in terms of chip area, perfor-
mance, and/or power consumption in comparison to protec-
tion mechanisms for memory elements. Technology trends
will lead to a significant reduction in both electrical and
latching-window masking, which accounts for a major por-
tion of the increase in SER of combinational logic.
The rest of this paper is organized as follows. Section 2
provides background on the nature of soft errors, and a
method for estimating the soft error rate of memory cir-
cuits. Section 3 introduces our definition of soft errors in
combinational logic, and examines the phenomena that can
mask soft errors in combinational logic. Section 4 describes
in detail our methodology for estimating the soft error rate
in combinational logic. We present our results in Section 5.
Section 6 discusses theimplications of ouranalysis and sim-
ulations. Section 7 summarizes the related work, and Sec-
tion 8 concludes the paper.
2 Background
2.1 Particles that cause soft errors
In the early 1980s, IBM conducted a series of experi-
ments to measure the particle flux [27]. Flux is generally a
measure of rate of flow; in this paper the flux of cosmic ray
particles is expressed as the number of particles of a particu-
lar energy per square centimeter per second. For our work,
the most important aspect of these results is that particles
of lower energy occur far more frequently than particles of
higher energy. In particular, a oneorder of magnitudediffer-
ence in energy can correspond to a two orders of magnitude
larger flux for the lower energy particles. As CMOS device
sizes decrease, they are more easily affected by these lower
energy particles, potentially leading to a much higher rate
of soft errors.
This paper investigates the soft error rate of combina-
tional logic caused by atmospheric neutrons with energies
greater than 1 mega-electron-volt (MeV). This form of ra-
diation, the result of cosmic rays colliding with particles
in the atmosphere, is known to be a significant source of
soft errors in memory elements. We do not consider at-
mospheric neutrons with energy less than 1 MeV since we
believe their much lower energies are less likely to result in
soft errors in combinational logic. We also do not consider
alpha particles, since this form of radiation comes almost
entirely from impurities in packaging material, and thus can
vary widely for processors at a particular technology. The
contribution to the overall soft error rate from each of these
radiation sources is additive, and thus each component can
be studied independently.
2.2 Soft Errors in Memory Circuits
In most modern microprocessors, combinational logic
and memory elements are constructed from the same ba-
sic devices NMOS and PMOS transistors. Therefore, we
can utilize techniques for estimating the SER in memory el-
ements to assess soft errors in combinational logic. We will
also use these techniques directly to compute the SER in
memory elements for a range of device sizes, and compare
the results to our estimates of SER for combinational logic.
High-energy neutrons lose energy in materials mainly
through collisions with silicon nuclei that lead to a chain of
secondary reactions. These reactions deposit a dense track
of electron-hole pairs as they pass through a p-n junction.
Some of the deposited charge will recombine, and some
will be collected at the junction contacts. When a parti-
cle strikes a sensitive region of an SRAM cell, the charge
that accumulates could exceed the minimum charge that is
needed to flip the value stored in the cell, resulting in a soft
error. The smallest chargethat results in a soft error is called
the critical charge (

) of the SRAM cell [6]. The rate
at which soft errors occur is typically expressed in terms
of Failures In Time (FIT), which measures the number of
failures per

hours of operation. A number of studies
on soft errors in SRAMs have concluded that the SER for
constant area SRAM arrays will increase as device sizes de-
crease [21, 20, 12], though researchers differ on the rate of
this increase.
A method for estimating SER in CMOS SRAM circuits
was recently developed by Hazucha & Svensson [8]. This
model estimates SER due to atmospheric neutrons (neu-
trons with energies
1MeV) for a range of submicron fea-
ture sizes. It is based on a verified empirical model for the
600nm technology, which is then scaled to other technolo-
gies. The basic form of this model is:


exp



(1)
where
is the neutron flux with energy
1
MeV (in n*cm

s

),
is the area of the circuit sensitive to
particle strikes (in cm
),

is the critical charge (in fC), and

is the charge collection efficiency of
the device (in fC)
Two key parameters in this model are the critical charge
(

) of the SRAM cell and the charge collection effi-
ciency (

) of the circuit.

depends on character-
istics of the circuit, particularly the supply voltage and the
effective capacitance of drain nodes.

is a measure of the
magnitude of charge generated by a particle strike. These
two parameters are essentially independent, but both de-
crease with decreasing featuresize. From Equation1 we see
that

will increase exponentially as

becomes
comparable to
.

is also proportional to the area of
the sensitive region of the device, and therefore decreases
proportional to the square of the device size. Hazucha &
Svensson used this model to evaluate the effect of device
scaling on theSER of memory circuits. Theyconcluded that
SER-per-chipof SRAM circuits should increase at most lin-
early with decreasing feature size.
3 Soft Errors in Combinational Logic
A particle that strikes a p-n junction within a combina-
tional logic circuit can alter the value generated by the cir-
cuit. However, a transient change in the value of a logic
circuit will not affect the results of a computation unless it
is captured in a memory circuit. Therefore, we define a soft
error in combinational logic as a transient error in the result
of a logic circuit that is subsequently stored in a memory
circuit of the processor.
A transient error in a logic circuit might not be captured
in a memory circuit because it might be masked by one of
the following three phenomena:
Logical masking occurs when a particle strikes a por-
tion of the combinational logic that is blocked from
affecting the output due to a subsequent gate whose
result is completely determined by its other input val-
ues.
Electrical Masking occurs when the pulse resulting
from a particle strike is attenuated by subsequent logic
gates due to the electrical properties of the gates to the
point that it does not affect the result of the circuit.
Latching-Window Masking occurs when the pulse
resulting from a particle strike reaches a latch, but not
at the clock transition where the latch captures its input
value.
These masking effects have been found to result in a sig-
nificantly lower rate of soft errors in combinational logic
in comparison to storage circuits in equivalent device tech-
nology [16]. However, these effects will diminish signif-
icantly as feature sizes decrease and the number of stages
in the processor pipeline increases. For example, electrical
masking will be reduced by device scaling because smaller
transistors are also faster and therefore will have less atten-
uation effect on the pulse. Also, deeper processor pipelines
result in higher clock rates, which means the latches in the
processor will cycle more frequently, which reduces the op-
portunity for latching-window masking.
3.1 Combinational Logic Model
The datapath of modern processors can be extremely
complicated in nature, typically composed of 64 parallel
bit lines and divided into 20 or more pipeline stages. We
have chosen to use a much simpler model for the purposes
of estimating the SER of combinational logic. Our model
is just a one-wide chain of homogeneous gates terminat-
ing in a latch. Figure 1 illustrates this pipeline model. The
gates we use in our study are all static combinational logic
gates. Many modern microprocesors also employ dynamic
logic because it occupies less area and offers greater flexi-
bility for techniques such as time borrowing. These devices
are commonly designed for high performance, and as a re-
sult have lower noise margins and may be more suscepti-
ble to soft errors. We believe our model can be extended
to estimate the SER for dynamic logic and other circuit
styles. The number of gates in the chain is dependent on
the degree of pipelining in the microarchitecture, which we
characterize by the number of fan-out-of-4 inverter (FO4)
gates that can be placed between two latches in a single
pipeline stage. The FO4 metric is technology independent
and 1 FO4 roughly corresponds to 360 pico-seconds times
the transistor’s drawn gate length in microns [10]. During
the last twelve years technology has scaled from 1000nm to
130nm and the amount of logic per pipeline stage has de-
creased from 84 to 12 FO4 contributing to a total of 60-fold
increase in clock frequencyin the Intel family of processors.
Aggressive pipelining could reduce this to as few as 6 in five
to seven years from now. For a given degree of pipelining,
the number of gates in the pipestage is largest number that
does not exceed the total delay of the corresponding FO4
chain.
In our model, a latch consists of a passgate, a forward in-
verter and a feedback inverter, where the forward inverter is

Delay of one pipestage
INPUT OUTPUT
CLKCLK
Pipestage is one−wide and composed of one type of gate
Q
Q’
DD
Q’
Q
Figure 1. Simple model for a processor pipeline.
about 6 times larger than the feedback inverter and the tran-
sistors are all of minimum length. We use level sensitive
latches in our pipeline model because they occupy less area
than edge triggered flip-flops and so are more suitable for
superpipelining. They also allow for time borrowing tech-
niques and offer less load to the clock distribution network
thus reducing the clock skew in the chip.
4 Methodology
Our methodology for estimating the soft error rate in
combinational logic considers the impact of CMOS device
scaling and the microarchitectural trend toward increasing
depth of processor pipelines. We determine the soft er-
ror rate using analytical models for each stage of the pulse
from its creation to the time it reaches the latch. Figure 2
shows the various stages the pulse passes through and the
corresponding model used to determine the effect on the
pulse at that stage. In the first stage an error current pulse
is produced from the charge and the corresponding voltage
pulse is also generated. The electrical masking model simu-
lates the effect of the electrical properties of the gates on the
pulse. Finally a model for the latching window determines
the probability that the pulse is successfully latched. The
following sections describe each model in detail.
4.1 Device Scaling model
In practice, technology parameters are scaled to achieve
certain physical objectives such as constant power den-
sity or constant electric field strength. Our method
for constructing technology parameters uses values from
the Semiconductor Industry Association (SIA) technology
roadmap [25], with minor adjustments to ensure that the
delay of a fan-out-of-4 (FO4) inverter satisfies Equation 2.
These technology parameters are used in the circuit simula-
tions and analytical models for estimating SER of combina-
tional logic.
FO4 delay
(in ps)



feature size (in
m) (2)
4.2 Charge to Voltage Pulse model
When a particle strikes a device it produces a current
pulse with a very rapid rise time, but a more gradual fall
time. The shape of the pulse can be approximated by a
one-parameter function [6] shown in Equation 3.


exp

(3)
refers to the amount of charge collected due to the
particle strike. The parameter
is the time constant of the
transistor for the charge collection process. If
is large
it takes more time for the charge to recombine and if
is
small the current pulse dies faster. The rapid rise of the
current pulse is captured in the square root function and the
gradual fall of the current pulse is produced by the negative
exponential dependence.
The current pulse produced by a particle strike results in
a voltage pulse at the output node of the device. We use
hspice to determine the characteristics of this voltage pulse.
The voltage pulse is described in a 3 parameter form - rise
time, fall time and effective width (width of the pulse at
half the supply voltage) and is used as input to the electrical
masking analytical model.
4.3 Electrical Masking Model
Electrical masking is the composition of two electri-
cal effects that reduce the strength of a pulse as it passes
through a logic gate. Circuit delays caused by the switch-
ing time of the transistors cause the rise and fall time of
the pulse to increase. Also, the amplitude of a pulse with
short duration may decrease since the gate may start to turn
off before the output reaches its full amplitude. The com-
bination of these two effects reduce the width of a pulse,
making it less likely to cause a soft error. These effects are
illustrated in Figure 3. The effect cascades from one gate to
the next because at each gate the slope decreases and hence
the amplitude also decreases.

at Latchat Start
(Rise,Fall,Width)(Rise,Fall,Width)
Error
Latched ?
Latching Window
Particle Strike Voltage PulseVoltage Pulse I, IIMODELS
Charge Collection
Setup the pipestage
Fix − 1) Gate levels Minimum Width
MODELS IV, V
MODEL III
1)Horowiitz
2)Degradation
Electrical Masking
Charge to Current pulse
Current toVoltage pulse
induced
2)Feature Size
YES
NO
CLK
D
Q’
Q D
Q’
Q
Figure 2. Overview of process to determine if a charge leads to a soft error
INPUT PULSE
OUTPUT PULSE
T
Tin
To
VSW_RISE
VDD/2
VDD
Figure 3. The Electrical Masking Effect.
Degradation of a pulse passing through a
transistor.
We constructed a model for electrical masking by inte-
grating two existing models. We use the Horowitz rise and
fall time model [11] to determine the rise and fall time of
the output pulse, and the Logical Delay Degradation Effect
Model [3] to determine the amplitude, and hence the dura-
tion, of the output pulse.
Horowitzrise and fall time model: The Horowitz model
calculates the rise and fall time of the output pulse based on
the gate switching voltages, CMOS model parameters and
the input rise/fall time. The model is very sensitive to the
values for the rise and fall switching voltages of the gates.
We used an iterative bisection method to determine val-
ues for the switching voltages. This procedure adjusted the
switching voltages until the rise and fall times predicted by
the model were within 15% of values obtained from hspice
simulations.
Delay degradation model: Delay degradation occurs
when an input transition occurs before the gate has com-
pletely switched from its previous transition. The new input
transition causes the gate to switch in the opposite direc-
tion leading to a degradation in the amplitude of the output
pulse. We use the “Delay Degradation Model” proposed
and validated by Bellidio-Diaz et al. [3] to determine how
a voltage pulse degrades as it passes through a logic gate.
This model determines the amplitude of the output pulse
based on the time between the output transition and the next
input transition,

, and the time needed for the gate
to switch fully, which is proportional to

. These param-
eters are illustrated in Figure 3.
4.4 Pulse latching model
Recall that our definition of a soft error in combinational
logic requires that an error pulse is captured in a memory
circuit. In our model, this means that the pulse is stored
into the level-sensitive latch at the end of a pipeline stage.
We only consider a value to be stored in the latch if it is
present (and stable) when the latch closes, since it is this
value that is then passed to the next pipeline stage.
When a voltage pulse reaches the input of a latch, we use
an hspice simulation to determine if it has sufficient ampli-
tude and duration to be captured by the latch. The simula-
tion is done in two steps. First we determine the pulse start
time, the shortest time between the rising edge of the pulse
and clock edge for which the pulse could be latched. This is
similar to a setup time analysis for the latch, except that the
input data waveform has the slope of the pulse at the latch
input. The second step is to determine the minimum du-
ration (measured at the threshold voltage) pulse that could
be latched. For this step, we position the rising edge of the
pulse at the point determined in the first step, and then vary
the duration until the minimum value is determined. We
studied the nature of the pulse start time and minimum du-
ration using separate experiments and found that the pulse
start time is a linear function of the rise time of the pulse,
and the minimum duration is a linear function of the rise

Citations
More filters
Journal ArticleDOI

Radiation-induced soft errors in advanced semiconductor technologies

TL;DR: In this article, the authors review the types of failure modes for soft errors, the three dominant radiation mechanisms responsible for creating soft errors in terrestrial applications, and how these soft errors are generated by the collection of radiation-induced charge.
Proceedings ArticleDOI

A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor

TL;DR: This paper identifies numerous cases, such as prefetches, dynamicallydead code, and wrong-path instructions, in which a fault will not affect correct execution, and shows AVFs of 28% and 9% for the instruction queue and execution units, respectively,averaged across dynamic sections of the entire CPU2000benchmark suite.
Proceedings ArticleDOI

Zyzzyva: speculative byzantine fault tolerance

TL;DR: In Zyzzyva, replicas respond to a client's request without first running an expensive three-phase commit protocol to reach agreement on the order in which the request must be processed.
Journal ArticleDOI

Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives

TL;DR: This paper provides a general description of NoC architectures and applications and enumerates several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation.
Proceedings ArticleDOI

SWIFT: Software Implemented Fault Tolerance

TL;DR: A novel, software-only, transient-fault-detection technique, called SWIFT, which efficiently manages redundancy by reclaiming unused instruction-level resources present during the execution of most programs and provides a high level of protection and performance with an enhanced control-flow checking mechanism.
References
More filters
Journal ArticleDOI

The future of wires

TL;DR: Wires that shorten in length as technologies scale have delays that either track gate delays or grow slowly relative to gate delays, which is good news since these "local" wires dominate chip wiring.
Journal ArticleDOI

The Alpha 21264 microprocessor

R.E. Kessler
- 01 Mar 1999 - 
TL;DR: A unique combination of high clock speeds and advanced microarchitectural techniques, including many forms of out-of-order and speculative execution, provide exceptional core computational performance in the 21264.
Proceedings ArticleDOI

DIVA: a reliable substrate for deep submicron microarchitecture design

TL;DR: It is argued that the DIVA checker should lend itself to functional and electrical verification better than a complex core processor, and overall design cost can be dramatically reduced because designers need only verify the correctness of the checker unit.
Proceedings ArticleDOI

Transient fault detection via simultaneous multithreading

TL;DR: The concept of the sphere of replication is introduced, which abstract both the physical redundancy of a lockstepped system and the logical redundancy of an SRT processor, and two mechanisms-slack fetch and branch outcome queue-are proposed and evaluated that enhance the performance of anSRT processor by allowing one thread to prefetch cache misses and branch results for the other thread.

The microarchitecture of the Pentium 4 processor

G. Hinton
TL;DR: The main features and functions of the NetBurst microarchitecture of Intel’s new flagship Pentium 4 processor are described, including its new form of instruction cache called the Execution Trace Cache.
Related Papers (5)
Frequently Asked Questions (19)
Q1. What are the contributions in "Modeling the effect of technology trends on soft error rate of combinational logic" ?

This paper examines the effect of technology scaling and microarchitectural trends on the rate of soft errors in CMOS memory and logic circuits. The authors describe and validate an end-to-end model that enables us to compute the soft error rates ( SER ) for existing and future microprocessor-style designs. 

The implication of this result is that further research is required into methods for protecting combinational logic from soft errors. The authors believe that techniques such as these combined with circuit and process innovations will be required to enable future construction of reliable high performance systems. 

Since the flux of particles is substantially greater for particles of lower energy, it follows that all circuits will experience higher soft error rates due to device scaling. 

Since the authors model a pipestage using a simple linear string of gates, the authors are actually modeling a minimal active path to the latch, which is the most conservative approximation of the logical masking effect. 

The allocation of memory element transistors to SRAM cells and latches depends on the number of latches required by the processor pipeline, which depends on pipeline depth. 

The authors use level sensitive latches in their pipeline model because they occupy less area than edge triggered flip-flops and so are more suitable for superpipelining. 

When a voltage pulse reaches the input of a latch, the authors use an hspice simulation to determine if it has sufficient amplitude and duration to be captured by the latch. 

Their study shows an increasing susceptibility to neutron-induced soft errors, particularly in logic circuits, due to device scaling and greater neutron flux at lower energies [27]. 

Their methodology for estimating the soft error rate in combinational logic considers the impact of CMOS device scaling and the microarchitectural trend toward increasing depth of processor pipelines. 

The number of gates in the chain is dependent on the degree of pipelining in the microarchitecture, which the authors characterize by the number of fan-out-of-4 inverter (FO4) gates that can be placed between two latches in a single pipeline stage. 

Two key reasons for this focus on memory elements are: 1) the techniques for protecting memory elements are well understood and relatively inexpensive in terms of the extra circuitry required, and 2) caches take up a large part, and in some cases a majority, of the chip area in modern microprocessors. 

The datapath of modern processors can be extremely complicated in nature, typically composed of 64 parallel bit lines and divided into 20 or more pipeline stages. 

These effects all work to reduce the masking phenomena that currently provide combinational logic with a form of natural protection against soft errors. 

Given the feature size and degree of pipelining, the basic steps to compute SER for a combinational logic circuit are:1. Compute the contribution to SER for each gate in the pipe stage, and2. 

These mechanisms are typically focused on protecting memory elements, particularly caches, using error-correcting codes (ECC), parity, and other techniques. 

The authors use the pulse-latching model to determine if the pulse that reaches the latch input has sufficient amplitude and duration to cause a soft error. 

The authors allocate one latch for each pipestage, where the number of pipestages is given by Equation 4.pipestages logic transistors gates per pipestage transistors per gate(4) 

The authors can determine the probability that the pulse causes a soft error by computing the probability that a randomly placed interval of length overlaps a fixed interval of length within an overall interval of length . 

weconclude that current technology trends will lead to a substantially more rapid increase in the soft error rate in combinational logic than in storage elements.