What is the effect of device scaling on the circuits?

Since the flux of particles is substantially greater for particles of lower energy, it follows that all circuits will experience higher soft error rates due to device scaling.

What is the conservative approximation of the logical masking effect?

Since the authors model a pipestage using a simple linear string of gates, the authors are actually modeling a minimal active path to the latch, which is the most conservative approximation of the logical masking effect.

How many transistors are allocated to SRAM cells and latches?

The allocation of memory element transistors to SRAM cells and latches depends on the number of latches required by the processor pipeline, which depends on pipeline depth.

What is the effect of device scaling on the SER of logic circuits?

Their study shows an increasing susceptibility to neutron-induced soft errors, particularly in logic circuits, due to device scaling and greater neutron flux at lower energies [27].

How many gates can be placed in a single pipeline stage?

The number of gates in the chain is dependent on the degree of pipelining in the microarchitecture, which the authors characterize by the number of fan-out-of-4 inverter (FO4) gates that can be placed between two latches in a single pipeline stage.

What is the effect of device scaling and superpipeling on combinational logic?

These effects all work to reduce the masking phenomena that currently provide combinational logic with a form of natural protection against soft errors.

What are the steps to calculate SER for a combinational logic circuit?

Given the feature size and degree of pipelining, the basic steps to compute SER for a combinational logic circuit are:1. Compute the contribution to SER for each gate in the pipe stage, and2.

What is the method used to determine if the pulse that reaches the latch input has enough?

The authors use the pulse-latching model to determine if the pulse that reaches the latch input has sufficient amplitude and duration to cause a soft error.

How many latches are allocated to each pipestage?

The authors allocate one latch for each pipestage, where the number of pipestages is given by Equation 4.pipestages logic transistors gates per pipestage transistors per gate(4)

How can the authors determine the probability that a pulse causes a soft error?

The authors can determine the probability that the pulse causes a soft error by computing the probability that a randomly placed interval of length overlaps a fixed interval of length within an overall interval of length .

What is the effect of the reduction in the soft error rate in combinational logic?

weconclude that current technology trends will lead to a substantially more rapid increase in the soft error rate in combinational logic than in storage elements.

(Open Access) Modeling the effect of technology trends on the soft error rate of combinational logic (2002) | Premkishore Shivakumar

Q: What are the contributions in "Modeling the effect of technology trends on soft error rate of combinational logic" ?

This paper examines the effect of technology scaling and microarchitectural trends on the rate of soft errors in CMOS memory and logic circuits. The authors describe and validate an end-to-end model that enables us to compute the soft error rates ( SER ) for existing and future microprocessor-style designs.

Q: What are the future works in "Modeling the effect of technology trends on soft error rate of combinational logic" ?

The implication of this result is that further research is required into methods for protecting combinational logic from soft errors. The authors believe that techniques such as these combined with circuit and process innovations will be required to enable future construction of reliable high performance systems.

Q: Why do the authors use level sensitive latches in their model?

The authors use level sensitive latches in their pipeline model because they occupy less area than edge triggered flip-flops and so are more suitable for superpipelining.

Q: What is the hspice simulation used to determine if a voltage pulse has?

When a voltage pulse reaches the input of a latch, the authors use an hspice simulation to determine if it has sufficient amplitude and duration to be captured by the latch.

Q: What is the methodology for estimating the soft error rate in combinational logic?

Their methodology for estimating the soft error rate in combinational logic considers the impact of CMOS device scaling and the microarchitectural trend toward increasing depth of processor pipelines.

Modeling the Effect of Technology Trends on

Soft Error Rate of Combinational Logic

Premkishore Shivakumar Michael Kistler



Stephen W. Keckler Doug Burger Lorenzo Alvisi

Department of Computer Sciences

University of Texas at Austin

Austin, TX USA



pkishore,kistler,skeckler,dburger,lorenzo



@cs.utexas.edu

IBM Technical Contacts: John Keaty, Rob Bell, and Ram Rajamony

Abstract

This paper examines the effect of technologyscaling and mi-

croarchitectural trends on the rate of soft errors in CMOS

memory and logic circuits. We describe and validate an

end-to-end model that enables us to compute the soft er-

ror rates (SER) for existing and future microprocessor-style

designs. The model captures the effects of two impor-

tant masking phenomena, electrical masking and latching-

window masking, which inhibit soft errors in combinational

logic. We quantify the SER in combinational logic and

latches for feature sizes from 600nm to 50nm and clock rates

from 16 to 6 fan-out-of-4 delays. Our model predicts that

the SER per chip of logic circuits will increase eight or-

ders of magnitude by the year 2011 and at that point will

be comparable to the SER per chip of unprotected memory

elements. Our result emphasizes the need for computer sys-

tem designers to address the risks of SER in logic circuits in

future designs.

1 Introduction

Two important trends driving microprocessor perfor-

mance are scaling of device feature sizes and increasing

pipeline depths. In this paper we explore how these trends

affect the susceptibility of microprocessors to soft errors.

Device scaling is reduction in feature size and voltage lev-

els of the basic devices on the microprocessor. The ba-

sic motivation for device scaling is to improve processor

performance, since smaller devices require less current to



This author is also employed at the IBM Austin Research Laboratory.

turn on or off, and thus can be operated at higher frequen-

cies. Pipelining is a microarchitectural technique for im-

proving performance by increasing instruction level paral-

lelism (ILP). Pipelining is a well accepted and almost uni-

versally adopted technique in microprocessor design. Five

stage pipelines are quite common, and even processors with

six to eight pipeline stages are considered to be relatively

simple designs. Recent processors have aggressively ap-

plied the techniques of pipelining, with some current de-

signs using upwards of twenty stages [9]. Such designs are

commonly referred to as superpipelined designs.

Our study focuses on soft errors, which are also called

transient faults or single-event upsets (SEUs). These are

errors in processor execution that are not due to design or

manufacturing defects, but instead due to electrical noise or

external radiation. In particular, we are interested in soft

errors caused by cosmic rays. The existence of cosmic ray

radiation has been known for over 50 years, and the capac-

ity for this radiation to create transient faults in semicon-

ductor circuits has been studied since the early 1980s. As

a result, most modern microprocessors already incorporate

mechanisms for detecting soft errors. These mechanisms

are typically focused on protecting memory elements, par-

ticularly caches, using error-correcting codes (ECC), par-

ity, and other techniques. Two key reasons for this focus

on memory elements are: 1) the techniques for protect-

ing memory elements are well understood and relatively in-

expensive in terms of the extra circuitry required, and 2)

caches take up a large part, and in some cases a majority, of

the chip area in modern microprocessors.

Past research has shown that combinational logic is

much less susceptible to soft errors than memory elements.

This is because three phenomena provide combinational

logic a form of natural resistance to soft errors: 1) logi-

cal masking, 2) electrical masking, and 3) latching-window

masking. We develop models for electrical masking and

latching-window masking to determine how these are af-

fected by device scaling and superpipelining. Then based

on a composite model we estimate the effects of these tech-

nology trends on the soft error rate (SER) of combinational

logic. Finally using an overall chip area model we com-

pare the SER/chip of combinational logic with the expected

trends in SER of memory elements.

The primary contribution of our work is an analysis of

the trends in SER for SRAM cells, latches, and combina-

tional logic. Our models predict that by 2011 the soft error

rate in combinational logic will be comparable to that of

unprotected memory elements. This is extremely signiﬁ-

cant because current methods for protecting combinational

logic have signiﬁcant costs in terms of chip area, perfor-

mance, and/or power consumption in comparison to protec-

tion mechanisms for memory elements. Technology trends

will lead to a signiﬁcant reduction in both electrical and

latching-window masking, which accounts for a major por-

tion of the increase in SER of combinational logic.

The rest of this paper is organized as follows. Section 2

provides background on the nature of soft errors, and a

method for estimating the soft error rate of memory cir-

cuits. Section 3 introduces our deﬁnition of soft errors in

combinational logic, and examines the phenomena that can

mask soft errors in combinational logic. Section 4 describes

in detail our methodology for estimating the soft error rate

in combinational logic. We present our results in Section 5.

Section 6 discusses theimplications of ouranalysis and sim-

ulations. Section 7 summarizes the related work, and Sec-

tion 8 concludes the paper.

2 Background

2.1 Particles that cause soft errors

In the early 1980s, IBM conducted a series of experi-

ments to measure the particle ﬂux [27]. Flux is generally a

measure of rate of ﬂow; in this paper the ﬂux of cosmic ray

particles is expressed as the number of particles of a particu-

lar energy per square centimeter per second. For our work,

the most important aspect of these results is that particles

of lower energy occur far more frequently than particles of

higher energy. In particular, a oneorder of magnitudediffer-

ence in energy can correspond to a two orders of magnitude

larger ﬂux for the lower energy particles. As CMOS device

sizes decrease, they are more easily affected by these lower

energy particles, potentially leading to a much higher rate

of soft errors.

This paper investigates the soft error rate of combina-

tional logic caused by atmospheric neutrons with energies

greater than 1 mega-electron-volt (MeV). This form of ra-

diation, the result of cosmic rays colliding with particles

in the atmosphere, is known to be a signiﬁcant source of

soft errors in memory elements. We do not consider at-

mospheric neutrons with energy less than 1 MeV since we

believe their much lower energies are less likely to result in

soft errors in combinational logic. We also do not consider

alpha particles, since this form of radiation comes almost

entirely from impurities in packaging material, and thus can

vary widely for processors at a particular technology. The

contribution to the overall soft error rate from each of these

radiation sources is additive, and thus each component can

be studied independently.

2.2 Soft Errors in Memory Circuits

In most modern microprocessors, combinational logic

and memory elements are constructed from the same ba-

sic devices – NMOS and PMOS transistors. Therefore, we

can utilize techniques for estimating the SER in memory el-

ements to assess soft errors in combinational logic. We will

also use these techniques directly to compute the SER in

memory elements for a range of device sizes, and compare

the results to our estimates of SER for combinational logic.

High-energy neutrons lose energy in materials mainly

through collisions with silicon nuclei that lead to a chain of

secondary reactions. These reactions deposit a dense track

of electron-hole pairs as they pass through a p-n junction.

Some of the deposited charge will recombine, and some

will be collected at the junction contacts. When a parti-

cle strikes a sensitive region of an SRAM cell, the charge

that accumulates could exceed the minimum charge that is

needed to ﬂip the value stored in the cell, resulting in a soft

error. The smallest chargethat results in a soft error is called

the critical charge (



) of the SRAM cell [6]. The rate

at which soft errors occur is typically expressed in terms

of Failures In Time (FIT), which measures the number of

failures per



hours of operation. A number of studies

on soft errors in SRAMs have concluded that the SER for

constant area SRAM arrays will increase as device sizes de-

crease [21, 20, 12], though researchers differ on the rate of

this increase.

A method for estimating SER in CMOS SRAM circuits

was recently developed by Hazucha & Svensson [8]. This

model estimates SER due to atmospheric neutrons (neu-

trons with energies



1MeV) for a range of submicron fea-

ture sizes. It is based on a veriﬁed empirical model for the

600nm technology, which is then scaled to other technolo-

gies. The basic form of this model is:



exp







 

(1)

where



is the neutron ﬂux with energy



MeV (in n*cm







is the area of the circuit sensitive to

particle strikes (in cm







is the critical charge (in fC), and





is the charge collection efﬁciency of

the device (in fC)

Two key parameters in this model are the critical charge

(





) of the SRAM cell and the charge collection efﬁ-

ciency (





) of the circuit.



 

depends on character-

istics of the circuit, particularly the supply voltage and the

effective capacitance of drain nodes.





is a measure of the

magnitude of charge generated by a particle strike. These

two parameters are essentially independent, but both de-

crease with decreasing featuresize. From Equation1 we see

that



will increase exponentially as

 

becomes

comparable to







is also proportional to the area of

the sensitive region of the device, and therefore decreases

proportional to the square of the device size. Hazucha &

Svensson used this model to evaluate the effect of device

scaling on theSER of memory circuits. Theyconcluded that

SER-per-chipof SRAM circuits should increase at most lin-

early with decreasing feature size.

3 Soft Errors in Combinational Logic

A particle that strikes a p-n junction within a combina-

tional logic circuit can alter the value generated by the cir-

cuit. However, a transient change in the value of a logic

circuit will not affect the results of a computation unless it

is captured in a memory circuit. Therefore, we deﬁne a soft

error in combinational logic as a transient error in the result

of a logic circuit that is subsequently stored in a memory

circuit of the processor.

A transient error in a logic circuit might not be captured

in a memory circuit because it might be masked by one of

the following three phenomena:

Logical masking occurs when a particle strikes a por-

tion of the combinational logic that is blocked from

affecting the output due to a subsequent gate whose

result is completely determined by its other input val-

ues.

Electrical Masking occurs when the pulse resulting

from a particle strike is attenuated by subsequent logic

gates due to the electrical properties of the gates to the

point that it does not affect the result of the circuit.

Latching-Window Masking occurs when the pulse

resulting from a particle strike reaches a latch, but not

at the clock transition where the latch captures its input

value.

These masking effects have been found to result in a sig-

niﬁcantly lower rate of soft errors in combinational logic

in comparison to storage circuits in equivalent device tech-

nology [16]. However, these effects will diminish signif-

icantly as feature sizes decrease and the number of stages

in the processor pipeline increases. For example, electrical

masking will be reduced by device scaling because smaller

transistors are also faster and therefore will have less atten-

uation effect on the pulse. Also, deeper processor pipelines

result in higher clock rates, which means the latches in the

processor will cycle more frequently, which reduces the op-

portunity for latching-window masking.

3.1 Combinational Logic Model

The datapath of modern processors can be extremely

complicated in nature, typically composed of 64 parallel

bit lines and divided into 20 or more pipeline stages. We

have chosen to use a much simpler model for the purposes

of estimating the SER of combinational logic. Our model

is just a one-wide chain of homogeneous gates terminat-

ing in a latch. Figure 1 illustrates this pipeline model. The

gates we use in our study are all static combinational logic

gates. Many modern microprocesors also employ dynamic

logic because it occupies less area and offers greater ﬂexi-

bility for techniques such as time borrowing. These devices

are commonly designed for high performance, and as a re-

sult have lower noise margins and may be more suscepti-

ble to soft errors. We believe our model can be extended

to estimate the SER for dynamic logic and other circuit

styles. The number of gates in the chain is dependent on

the degree of pipelining in the microarchitecture, which we

characterize by the number of fan-out-of-4 inverter (FO4)

gates that can be placed between two latches in a single

pipeline stage. The FO4 metric is technology independent

and 1 FO4 roughly corresponds to 360 pico-seconds times

the transistor’s drawn gate length in microns [10]. During

the last twelve years technology has scaled from 1000nm to

130nm and the amount of logic per pipeline stage has de-

creased from 84 to 12 FO4 contributing to a total of 60-fold

increase in clock frequencyin the Intel family of processors.

Aggressive pipelining could reduce this to as few as 6 in ﬁve

to seven years from now. For a given degree of pipelining,

the number of gates in the pipestage is largest number that

does not exceed the total delay of the corresponding FO4

chain.

In our model, a latch consists of a passgate, a forward in-

verter and a feedback inverter, where the forward inverter is

Delay of one pipestage

INPUT OUTPUT

CLKCLK

Pipestage is one−wide and composed of one type of gate

Q’

Figure 1. Simple model for a processor pipeline.

about 6 times larger than the feedback inverter and the tran-

sistors are all of minimum length. We use level sensitive

latches in our pipeline model because they occupy less area

than edge triggered ﬂip-ﬂops and so are more suitable for

superpipelining. They also allow for time borrowing tech-

niques and offer less load to the clock distribution network

thus reducing the clock skew in the chip.

4 Methodology

Our methodology for estimating the soft error rate in

combinational logic considers the impact of CMOS device

scaling and the microarchitectural trend toward increasing

depth of processor pipelines. We determine the soft er-

ror rate using analytical models for each stage of the pulse

from its creation to the time it reaches the latch. Figure 2

shows the various stages the pulse passes through and the

corresponding model used to determine the effect on the

pulse at that stage. In the ﬁrst stage an error current pulse

is produced from the charge and the corresponding voltage

pulse is also generated. The electrical masking model simu-

lates the effect of the electrical properties of the gates on the

pulse. Finally a model for the latching window determines

the probability that the pulse is successfully latched. The

following sections describe each model in detail.

4.1 Device Scaling model

In practice, technology parameters are scaled to achieve

certain physical objectives such as constant power den-

sity or constant electric ﬁeld strength. Our method

for constructing technology parameters uses values from

the Semiconductor Industry Association (SIA) technology

roadmap [25], with minor adjustments to ensure that the

delay of a fan-out-of-4 (FO4) inverter satisﬁes Equation 2.

These technology parameters are used in the circuit simula-

tions and analytical models for estimating SER of combina-

tional logic.



FO4 delay

(in ps)







feature size (in



m) (2)

4.2 Charge to Voltage Pulse model

When a particle strikes a device it produces a current

pulse with a very rapid rise time, but a more gradual fall

time. The shape of the pulse can be approximated by a

one-parameter function [6] shown in Equation 3.























exp









(3)



refers to the amount of charge collected due to the

particle strike. The parameter



is the time constant of the

transistor for the charge collection process. If



is large

it takes more time for the charge to recombine and if



small the current pulse dies faster. The rapid rise of the

current pulse is captured in the square root function and the

gradual fall of the current pulse is produced by the negative

exponential dependence.

The current pulse produced by a particle strike results in

a voltage pulse at the output node of the device. We use

hspice to determine the characteristics of this voltage pulse.

The voltage pulse is described in a 3 parameter form - rise

time, fall time and effective width (width of the pulse at

half the supply voltage) and is used as input to the electrical

masking analytical model.

4.3 Electrical Masking Model

Electrical masking is the composition of two electri-

cal effects that reduce the strength of a pulse as it passes

through a logic gate. Circuit delays caused by the switch-

ing time of the transistors cause the rise and fall time of

the pulse to increase. Also, the amplitude of a pulse with

short duration may decrease since the gate may start to turn

off before the output reaches its full amplitude. The com-

bination of these two effects reduce the width of a pulse,

making it less likely to cause a soft error. These effects are

illustrated in Figure 3. The effect cascades from one gate to

the next because at each gate the slope decreases and hence

the amplitude also decreases.

at Latchat Start

(Rise,Fall,Width)(Rise,Fall,Width)

Error

Latched ?

Latching Window

Particle Strike Voltage PulseVoltage Pulse I, IIMODELS

Charge Collection

Setup the pipestage

Fix − 1) Gate levels Minimum Width

MODELS IV, V

MODEL III

1)Horowiitz

2)Degradation

Electrical Masking

Charge to Current pulse

Current toVoltage pulse

induced

2)Feature Size

YES

CLK

Q’

Q D

Q’

Figure 2. Overview of process to determine if a charge leads to a soft error

INPUT PULSE

OUTPUT PULSE

Tin

VSW_RISE

VDD/2

VDD

Figure 3. The Electrical Masking Effect.

Degradation of a pulse passing through a

transistor.

We constructed a model for electrical masking by inte-

grating two existing models. We use the Horowitz rise and

fall time model [11] to determine the rise and fall time of

the output pulse, and the Logical Delay Degradation Effect

Model [3] to determine the amplitude, and hence the dura-

tion, of the output pulse.

Horowitzrise and fall time model: The Horowitz model

calculates the rise and fall time of the output pulse based on

the gate switching voltages, CMOS model parameters and

the input rise/fall time. The model is very sensitive to the

values for the rise and fall switching voltages of the gates.

We used an iterative bisection method to determine val-

ues for the switching voltages. This procedure adjusted the

switching voltages until the rise and fall times predicted by

the model were within 15% of values obtained from hspice

simulations.

Delay degradation model: Delay degradation occurs

when an input transition occurs before the gate has com-

pletely switched from its previous transition. The new input

transition causes the gate to switch in the opposite direc-

tion leading to a degradation in the amplitude of the output

pulse. We use the “Delay Degradation Model” proposed

and validated by Bellidio-Diaz et al. [3] to determine how

a voltage pulse degrades as it passes through a logic gate.

This model determines the amplitude of the output pulse

based on the time between the output transition and the next

input transition,













, and the time needed for the gate

to switch fully, which is proportional to







. These param-

eters are illustrated in Figure 3.

4.4 Pulse latching model

Recall that our deﬁnition of a soft error in combinational

logic requires that an error pulse is captured in a memory

circuit. In our model, this means that the pulse is stored

into the level-sensitive latch at the end of a pipeline stage.

We only consider a value to be stored in the latch if it is

present (and stable) when the latch closes, since it is this

value that is then passed to the next pipeline stage.

When a voltage pulse reaches the input of a latch, we use

an hspice simulation to determine if it has sufﬁcient ampli-

tude and duration to be captured by the latch. The simula-

tion is done in two steps. First we determine the pulse start

time, the shortest time between the rising edge of the pulse

and clock edge for which the pulse could be latched. This is

similar to a setup time analysis for the latch, except that the

input data waveform has the slope of the pulse at the latch

input. The second step is to determine the minimum du-

ration (measured at the threshold voltage) pulse that could

be latched. For this step, we position the rising edge of the

pulse at the point determined in the ﬁrst step, and then vary

the duration until the minimum value is determined. We

studied the nature of the pulse start time and minimum du-

ration using separate experiments and found that the pulse

start time is a linear function of the rise time of the pulse,

and the minimum duration is a linear function of the rise

Modeling the effect of technology trends on the soft error rate of combinational logic

Figures

Citations

Radiation-induced soft errors in advanced semiconductor technologies

A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor

Zyzzyva: speculative byzantine fault tolerance

Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives

SWIFT: Software Implemented Fault Tolerance

References

The future of wires

The Alpha 21264 microprocessor

DIVA: a reliable substrate for deep submicron microarchitecture design

Transient fault detection via simultaneous multithreading

The microarchitecture of the Pentium 4 processor

Related Papers (5)

A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor

Radiation-induced soft errors in advanced semiconductor technologies

Transient fault detection via simultaneous multithreading

Robust system design with built-in soft-error resilience

Basic mechanisms and modeling of single-event upset in digital microelectronics

Frequently Asked Questions (19)

Q1. What are the contributions in "Modeling the effect of technology trends on soft error rate of combinational logic" ?

Q2. What are the future works in "Modeling the effect of technology trends on soft error rate of combinational logic" ?

Q3. What is the effect of device scaling on the circuits?

Q4. What is the conservative approximation of the logical masking effect?

Q5. How many transistors are allocated to SRAM cells and latches?

Q6. Why do the authors use level sensitive latches in their model?

Q7. What is the hspice simulation used to determine if a voltage pulse has?

Q8. What is the effect of device scaling on the SER of logic circuits?

Q9. What is the methodology for estimating the soft error rate in combinational logic?

Q10. How many gates can be placed in a single pipeline stage?

Q11. What are the main reasons for this focus on memory elements?

Q12. How many stages are there in a processor pipeline?

Q13. What is the effect of device scaling and superpipeling on combinational logic?

Q14. What are the steps to calculate SER for a combinational logic circuit?

Q15. What are the common mechanisms used to protect memory elements?

Q16. What is the method used to determine if the pulse that reaches the latch input has enough?

Q17. How many latches are allocated to each pipestage?

Q18. How can the authors determine the probability that a pulse causes a soft error?

Q19. What is the effect of the reduction in the soft error rate in combinational logic?