## **Soft Error Rate Analysis for Sequential Circuits**\* Natasa Miskov-Zivanov, Diana Marculescu Department of Electrical and Computer Engineering Carnegie Mellon University {nmiskov,dianam}@ece.cmu.edu #### Abstract Due to reduction in device feature size and supply voltage, the sensitivity to radiation induced transient faults (soft errors) of digital systems increases dramatically. Intensive research has been done so far in modeling and analysis of combinational circuit susceptibility to soft errors, while sequential circuits have received much less attention. In this paper, we present an approach for evaluating the susceptibility of sequential circuits to soft errors. The proposed approach uses symbolic modeling based on BDDs/ADDs and probabilistic sequential circuit analysis. The SER evaluation is demonstrated by the set of experimental results, which show that, for most of the benchmarks used, the SER decreases well below a given threshold (10<sup>7</sup> FIT) within ten clock cycles after the hit. The results obtained with the proposed symbolic framework are within 4% average error and up to 11000X faster when compared to HSPICE detailed circuit simulation. #### 1. Introduction Once regarded as a concern only for space applications, transient faults caused by radiation are becoming a major barrier to robust system design manufactured at advanced technology nodes like 90nm, 65nm or smaller. The high data-integrity and reliability requirements make these faults an extremely important design aspect for microprocessors, as well as network components. Therefore, the protection from radiation induced transient faults has become as important as other product characteristics such as performance or power consumption [1]. A radiation-induced charged particle passing through a microelectronic device ionizes the material along its path. The free carriers that are created around the particle track can be affected (attracted/rejected) by an internal electric field of the device and result in an electrical pulse, *single-event transient* (SET), large enough to disrupt normal device operation. This disruption is not associated with any permanent damage to the device and is thus called a *soft error* or a *single-event upset* (SEU). The effect of soft errors is measured by the *soft error rate* (SER) in FITs (failure-in-time), which is defined as one failure in $10^9$ hours. Traditionally, memory elements have been much more sensitive to soft errors than combinational logic circuits. Three factors prevented logic from becoming more susceptible to soft errors: - logical masking to be latched, a SET has to propagate on a sensitized path from the location where it originates to a latch; - electrical masking due to the electrical properties of the gates the glitch is passing through, it can be attenuated or even completely masked before it reaches the latch; - latching-window masking only if the glitch reaches the latch and satisfies setup and hold time conditions, it will be latched. Reduction in feature sizes and supply voltages allows lower energy particles to result in SET. Technology scaling decreases the impact of the three masking factors on radiation-induced SET. Reduced logic depth and smaller gate delays decrease attenuation when the glitch propagates through the circuit. Finally, increase in clock frequency decreases latching-window masking. Thus, SER in logic is increasing with every technology node and is expected to become an issue beyond 90nm technology node. Moreover, once a SET can propagate freely through combinational circuit, sequential logic will become very sensitive to such events [2]. This is due to the fact that, once latched, soft errors can propagate through the sequential circuit in subsequent clock cycles and thus affect the outputs of the circuit more than once. In this work, we estimate the likelihood that a SET in a sequential circuit will lead to errors in clock cycles following the particle hit. Our main goal is to allow for *symbolic modeling* and *efficient estimation* of the susceptibility of a sequential circuit to soft errors. The rest of this paper is organized as follows. In Section 2 we give an overview of related work and outline the contribution of our work. In Section 3 we briefly review the sequential circuit preliminaries. Section 4 presents the application of Markov chain theory on steady-state *SER* analysis. In Section 5, we describe in more detail our methodology for determining sequential circuit susceptibility to soft errors. In Section 6, we report experimental results for a set of common benchmarks. Finally, with Section 7 we conclude our work and provide some directions for future work. ## 2. Related work Intensive research has been done so far in the area of analysis of transient faults in both combinational and sequential circuits [1]-[7]. One obvious approach is to inject the fault into the given node of the circuit and simulate the circuit for different input vectors in order to find whether the fault propagates [2],[4],[7]. However, this approach becomes intractable for larger circuits and larger number of inputs and thus gives way to approximate approaches that use analytical and symbolic methods to evaluate circuit susceptibility to soft errors. In this section, we describe existing methods used to find the susceptibility to soft errors of sequential circuits. We also briefly outline the contributions of our work and compare it to previous work. ## 2.1. SER in sequential circuits Compared to the number of methods proposed for modeling soft error susceptibility of combinational circuits, sequential circuits have received less attention. Most of the previous work in evaluating *SER* in sequential circuits has been done using simulation. Since sequential circuits have a feedback loop leading back to the state inputs of the circuit, it is possible that errors latched at state lines propagate through the circuit more than $<sup>^{\</sup>ast}$ This work was supported in part by National Science Foundation Grant CCF-0542644. **Fig. 1.** Example circuit *S27* and results for separate and unified treatment of masking factors for three initial glitch durations (80ps, 100ps and 125ps). once. Thus, the effect of a single particle hit can affect outputs during several clock cycles. To consider this effect, the analysis of the propagation of an SET through sequential circuit in more than one clock cycle is necessary. In the worst case, this analysis and evaluation would have to consider an infinitely large number of cycles. Therefore, to be able to model and analyze sequential circuit susceptibility to soft errors, we need approximate methods. Although there has been a lot of work in the area of modeling the probabilistic behavior of finite state machines (FSMs) [8],[9], the main goal of those methods was calculating steady-state behavior of the circuit, which can be applied, for example, in estimating the switching activity of the circuit for the purpose of power evaluation. However, in the case of soft errors, steady-state behavior may not be relevant and transient behavior of the circuit is more important, that is: (i) the time the circuit spends transitioning through erroneous states until it reaches a steady-state behavior; and (ii) the effect this transitioning has on the outputs, that is, the susceptibility to soft errors of the target sequential circuit. One method that evaluates the probability of latching the error in sequential circuit, in the cycles following the particle hit, was proposed by Asadi et al. [6]. In that work, the authors assume hits can happen only at state flip-flops and then, find the error probability at each output due to each individual flip flop hit. This analysis excludes cases where internal gates of circuit's combinational logic are hit and includes logical masking only. Such an approach does not hold for the case of internal gate hits when electrical and latching-window masking need to be included as well. Furthermore, the authors report their result in terms of the mean time to manifest error (MTTM), which is a non-standard metric, and not in terms of SER, which is the most common metric for measuring the soft error susceptibility of circuits. Their framework has a 12% accuracy when compared to logic simulation, while ours is within 4% accurate when compared to HSPICE, at an 11000X speedup. Given the limited assumptions in [6] and their lack of using standard measures for soft error rate, we compare our work only with circuit level simulation (HSPICE). #### 2.2. Paper contribution As opposed to Markovian analysis approaches [8],[9] that allow only steady-state analysis, the method proposed in this paper allows for both transient and steady-state evaluation of the propagation of SET and the soft error susceptibility of sequential circuits. From among the methods proposed for soft error susceptibility of combinational circuits, we chose to use the symbolic modeling framework presented in [5] that relies on Binary Decision Diagrams (BDDs) and Algebraic Decision Diagrams (ADDs). Together with an efficient framework for probabilistic analysis of sequential circuits, the main contributions of this paper consist of: **Fig. 2.** *SER* changes in circuit *S27* during several clock cycles for different input probability distributions. 1. Unified symbolic treatment of all types of masking. The framework proposed in [5] for soft error susceptibility evaluation of combinational circuits was chosen as the basis for sequential circuit analysis due to the fact that it provides a unified treatment of the three masking factors: logical, electrical and latchingwindow masking. More precisely, by using BDDs and ADDs, the information about the masking factors is implicitly generated inside the decision diagrams, therefore including their joint dependency on input patterns and circuit topology. This allows for efficient concurrent computation of output error susceptibility due to hits on various internal nodes. This type of unified treatment is necessary for correctly determining the likelihood of a soft error being registered in the state lines, as well as for providing an exact framework for detecting both transient and steady-state propagation effects of such errors in cycles following the gate hit. The *unified* treatment of the three masking factors is important, as it can be seen from the example in Fig. 1. We consider separately the effect of logical masking, on one hand, and the effect of electrical and latching-window masking, on the other hand, for the ISCAS'89 benchmark S27. The results shown in the table from Fig. 1 represent minimum, maximum and average relative error of the model that evaluates electrical, latching-window and logical masking separately, compared to the unified model averaged across ten different input vector probability distributions, for three different initial glitch durations. As it can be seen from these results, multiplying the probability of logical masking with the probability of electrical and latching-window masking that were computed separately leads to the error in the probability of latching the glitch which can be as large as 3100%. However, for smaller glitch duration (80ps), the average error is not very large, due to the fact that most glitches are masked. For the case of large initial glitches (125ps), all glitches propagate, and the only difference between the two methods comes from handling the reconvergent paths. 2. Exact and approximate methods for SER estimation in sequential circuits. To take into account the joint effect of logical, electrical and latching-window masking and, at the same time, to allow for the efficient estimation of the effects in time of SET on the outputs of the sequential circuit, we rely on our two proposed methods for exact and approximate evaluation of SER in sequential circuits, as described in detail in Sections 4 and 5. The exact method relies on Markov Chain (MC) analysis-based SER estimates following a hit. To cope with potential state explosion/complexity problems associated with this type of analysis and to allow for modeling of transient effects in SER Fig. 3. A typical sequential circuit. evaluation, we also propose a low-cost, approximate method based on *circuit unrolling*. For a better understanding of the methodology proposed in this work, we show in Fig. 2 the results obtained using our approximate method for the example circuit S27 for several input vector probability distributions (PD). The results presented in Fig. 2 describe the effect of a particle hit on circuit behavior, that is, the output error probability variation in time. As it can be seen, in most cases SER converges to very low values, except for a few cases in which it stays almost constant. This shows that SER transient behavior is heavily dependent on the input distribution, and thus classic MC analysis may not be appropriate for capturing it. Our framework is not only scalable, but also accurate when compared to detailed circuit simulation. As shown in Section 6.2, the proposed framework is within 4% accurate when compared to HSPICE, at an 11000X speedup. ## 3. Sequential circuits - preliminaries A typical sequential circuit consists of combinational logic and flip-flops (FFs), as shown in Fig. 3. The inputs to the combinational logic are the primary inputs and the outputs of FFs, while the outputs of combinational logic are the primary outputs and inputs of the FFs. ## 3.1. Finite State Machines As an abstraction of sequential circuits, we use a finite state machine (FSM). The probabilistic behavior of a sequential circuit is often analyzed using concepts of Markov chain (MC) theory, as described in [8],[9]. A state transition graph (STG) that represents state transitions of a FSM, given input values, can be transformed into the discrete-parameter MC by attaching to each out-going edge of each state a label that represents the transition probability. The transition probabilities of MC for a given circuit can be calculated when the input distribution that exercises the inputs of the FSM is known. It is often required to determine the long-run behavior of MCs, that is, the *limit state probability*. For a given MC, the limit probabilities do not depend on the initial state and are called the *steady-state probabilities* of the MC. ## 3.2. BDD/ADD based modeling of SET We present in this section the main aspects of a BDD/ADD based analysis of SET propagation and SER evaluation in combinational circuits proposed in [5] that is at the core of our own proposed probabilistic analysis of SER in sequential circuits. The framework in [5] captures all *gate-output* combinations, i.e., it determines the probability of a soft error at any output due to a fault originating at any internal gate. To find the probability that a glitch originating at a gate G is latched at output F, all possible values for the duration and amplitude of a glitch at the output F are found. To determine the probability of having a glitch of duration $D_k$ at that output, BDDs and ADDs are used. For each output $F_j$ , an initial duration $d_{init}$ and initial amplitude $a_{init}$ at the output of gate hit, the authors in [5] find mean error susceptibility (MES) as the probability of output $F_i$ failing due to errors at internal gates: $$MES(F_j^{d_{mit}, a_{mit}}) = \frac{\sum_{k=1}^{n_f} \sum_{i=1}^{n_G} P(F_j fails \mid G_i fails \cap init\_glitch = (d_{init}, a_{init}))}{n_G \cdot n_f}$$ (1) where $n_G$ is the cardinality of the set of internal gates of the circuit, $\{G_i\}$ and $n_f$ is the cardinality of the set of probability distributions, $\{f_k\}$ , associated to the input vector stream. It has been shown in [5] that the probability of output $F_i$ failing, $P(F_i)$ can be defined using MES metric and the SER for a given output can then be computed using the expression from [5] as: $$SER_{F_i} = P(F_i) \cdot R_{eff} \cdot R_{PH} \cdot A_{circuit} \tag{2}$$ where $R_{PH}$ is the particle hit rate per unit of area, $R_{eff}$ is the fraction of particle hits that result in charge generation, and $A_{circuit}$ is the total silicon area of the circuit. Once $P(F_i)$ is computed for every output (including state lines), one can use the error probability for the state lines to determine steady-state and timedependent behavior of error propagation in the sequential circuit. ## 4. Markov chain theory for steady-state SER analysis As described in Section 3.1, the probabilistic behavior of a sequential circuit can be analyzed using MC theory. Therefore, it is natural to consider using MC analysis for probabilistic analysis of sequential circuit soft error susceptibility. In the approaches used in [8],[9], it was shown how to calculate the steady-state behavior of FSMs by means of MC analysis. We describe here one possible method that uses MCs for SER analysis. We propose to modify the original sequential circuit as shown in Fig. 4. The new circuit consists of two copies of the combinational logic of the original circuit, Combinational logic (gold), $CL_1$ , and Combinational logic (hit), $CL_2$ . Logic $CL_1$ is used to collect the information about the correct behavior of the circuit, having as inputs primary input vector (PI<sup>1</sup>) and the correct present-state vector $(PS^1)$ and as outputs the correct primary output vector $(PO^1)$ and the correct next state vector $(NS^1)$ . On the other hand, circuit $CL_2$ has as inputs primary input vector $(PI^2,$ where $PI^2 \equiv PI^1$ ) and possibly erroneous present-state lines $(PS^2)$ and as outputs possibly erroneous primary output vector $(PO^2)$ and next-state vector $(NS^2)$ . We can define the next state vectors of the *gold* and *hit* circuit as: $NS^1 = \boldsymbol{\delta}^1 = (\delta_1^1, \delta_2^1, ..., \delta_m^1)$ and $NS^2 = \boldsymbol{\delta}^2 = (\delta_1^2, \delta_2^2, ..., \delta_m^2)$ $$NS^{1} = \mathbf{\delta}^{1} = (\delta_{1}^{1}, \delta_{2}^{1}, ..., \delta_{m}^{1})$$ and $NS^{2} = \mathbf{\delta}^{2} = (\delta_{1}^{2}, \delta_{2}^{2}, ..., \delta_{m}^{2})$ where vectors $\delta^1$ and $\delta^2$ can take values from the finite set S of the states of the original circuit. m is the number of state variables. The modified circuit has a new state vector consisting of the state lines (variables) of the original (gold) circuit and an error vector $\mathbf{\varepsilon}$ $= (\varepsilon_1, \varepsilon_2, ..., \varepsilon_m)$ : $$NS^{modified} = (\boldsymbol{\delta}^1, \boldsymbol{\varepsilon}) = (\delta_1^1, \delta_2^1, ..., \delta_m^1, \boldsymbol{\varepsilon}_1, \boldsymbol{\varepsilon}_2, ..., \boldsymbol{\varepsilon}_m)$$ The error vector $\mathbf{\varepsilon}$ is defined as: $$\boldsymbol{\varepsilon} = \boldsymbol{\delta}^1 \oplus \boldsymbol{\delta}^2 = (\boldsymbol{\delta}_1^1 \oplus \boldsymbol{\delta}_1^2, \boldsymbol{\delta}_2^1 \oplus \boldsymbol{\delta}_2^2, ..., \boldsymbol{\delta}_m^1 \oplus \boldsymbol{\delta}_m^2)$$ and can take values from the finite set E representing possible errors in the state lines of the original circuit. In other words: $\varepsilon_i$ = 1, when there is an error in state line $\delta_i$ , and $\varepsilon_i = 0$ otherwise for i=1,2,...,m. $PS^2$ vector at the input of $CL_2$ is then obtained by XOR-ing the $PS^1$ vector $\delta^1$ and error vector $\boldsymbol{\varepsilon}$ . The main goal of the soft error susceptibility analysis for sequential circuits is to find the transition probabilities between the erroneous states from the set E and from there, to determine the behavior of the sequential circuit when the soft error occurs. In other words, we are interested in finding the steady-state probability distribution for the error vector $\mathbf{\epsilon}$ . This can be found Fig. 4. Circuit model used to perform Markov chain analysis for a given sequential circuit. from the probability vector $\boldsymbol{\pi}^{modified}$ representing the steady-state distribution for the modified circuit by summing the probabilities $\boldsymbol{\pi}^{modified}_{i,j} = \Pr(\boldsymbol{\delta}^! = i, \boldsymbol{\varepsilon} = j)$ over all vectors that have the same values for $\boldsymbol{\varepsilon}$ : $$\pi_{j}^{error} = \sum_{i} \pi_{i,j}^{modified} = \sum_{i} \Pr(\mathbf{\delta}^{1} = i, \mathbf{\varepsilon} = j) = \Pr(\mathbf{\varepsilon} = j)$$ (3) We find the STGs for the given original circuit and for its modified version shown in Fig. 4. From the STGs of both circuits and given the input vector probability distribution and particle hit probability, we can find their corresponding MCs. Thus, given the set of states $\{(\boldsymbol{\delta}^1, \boldsymbol{\epsilon})\}$ and transition probabilities for the modified circuit, $P^{modified}$ , and given the initial error state probability $\boldsymbol{\epsilon}(0)$ , by using MC theory, we can determine the behavior of the sequential circuit after a soft error occurs. Starting with the initial probability distribution for the state vector $(\boldsymbol{\delta}^1, \boldsymbol{\epsilon})$ , we can apply various techniques (e.g., power method) on the transition probability matrix $P^{modified}$ to determine the steady-state behavior, under given state error probabilities. Working with the full (modified) MC can be prohibitive in terms of cost. While this approach is feasible for small benchmarks (like S27 where the modified FSM has 64 states), it can become inapplicable for larger benchmarks. Since we are interested in transitions between erroneous states only, one possible solution to the complexity problem is to use an approximation of the transition probability matrix $P^{modified}$ . An example of such a method is to partition and aggregate the states with the same $\varepsilon$ vector values such that the size of the matrix $P^{modified}$ decreases. This method has been previously used in power analysis and evaluation of sequential circuits [8],[9]. Due to the space constraints, we cannot present the aggregation approach here. Although established and easy to use, MC analysis has one major drawback: although allowing for the evaluation of long-term or steady-state behavior of the sequential circuit, it fails short in the following when applied to the *SER* estimation: - It cannot capture the effect of the error on the outputs of the circuit as a function of time – it only estimates what is the steady-state distribution; - It cannot include the effect of electrical and latching-window masking, and instead can model only logical masking, unless information is available about the likelihood of a latched error in a state line after a particle hits; - It becomes impractical for analyzing circuits with larger number of state lines, and thus exponentially larger number of states. One possible solution is to use the approximation techniques such as aggregation or Monte Carlo simulation, but this can affect the accuracy of the method. In the sequel, we present a practical solution to this problem. # 5. A practical approach for time-dependent *SER* analysis In order to estimate the probability of errors in sequential circuits in an efficient manner that captures both transient and steady-state effects while easily incorporating the joint impact of logical, electrical, and latching window masking, we propose to use unrolling of the sequential circuit, as shown in Fig. 5. Such a framework allows for efficient time-dependent analysis of the effect of SET on outputs of sequential circuit. When the glitch occurs either at state lines $PS^1$ or at the output of some internal gate of the combinational logic, it can have a duration much shorter than the clock period and amplitude smaller than $V_{dd}$ and thus be affected by electrical and latching-window masking. If the glitch results in an error in a FF, it will be further propagated as a full-cycle error and will only be logically masked when not on a sensitized path. Therefore, in all sub-stages following the cycle when the hit occurred, we can use the framework from [5] to analyze the soft error behavior, but we only need to incorporate logical masking effects. Thus, the k-unrolled circuit has two main stages: Stage I – 1<sup>st</sup> cycle (during which the hit occurs) and Stage II – 2<sup>nd</sup> to k<sup>th</sup> cycles (sub-stages). We can then find the probability of error at each output and each next-state line in Stage I as described in [5]. In Stage II, we can lump the logic of sub-stages 2 to k into a single logic circuit. Stage-II logic will have (k-1) times more inputs and (k-1) times more outputs. We can then find the probability of error for each pair ( $state\ line\ -output$ ), that is, the probability that the wrong value is latched at the output, given that it occurred at $state\ line$ . Therefore, the probability of error at each output of Stage II is a conditional probability, given that an error did occur at the $state\ line$ . For a given input probability distribution, we find these probabilities using the symbolic framework described in Section 3.2, as follows: $$P(F_j^{k,d_{lost},a_{lost}}) = \sum_l P(F_j^k \mid F_l^{l,d_{lost},a_{lost}}) P(F_l^{l,d_{lost},a_{lost}})$$ (4) where $P(F_j^{k,d_{int},a_{int}})$ is the probability of output j at the stage k failing, given an initial glitch duration and amplitude $a_{init}$ and $d_{init}$ . $P(F_j^k \mid F_l^{1,a_{init},d_{init}})$ is the probability of error at the output j at the stage k, given that an error was latched at the state line l after the first stage with: $$P(F_{l}^{1,d_{mit},a_{mit}}) = \sum_{i=1}^{n_{G}} P(F_{l}fails \mid G_{i}fails \cap init\_glitch = (d_{mit},a_{mit}))$$ $$n_{G}$$ particle hit $$PI^{1} \qquad PO^{2} \qquad PI^{2} \qquad PO^{2} \qquad PI^{k} \qquad PO^{k}$$ $$logic \qquad NS^{1} \quad PS^{2} \qquad PS^{k} \qquad logic \qquad NS^{k}$$ $$NS^{k} \quad PS^{k} \qquad PS^{k}$$ STAGE I (L, E, LW) STAGE II (L) Fig. 5. k-times unrolled sequential circuit divided into two main stages: Stage I and Stage II. Stage II is further subdivided into k-1 sub-stages (Pl: primary inputs of the l<sup>h</sup> sub-stage, PO: primary outputs of the l<sup>h</sup> sub-stage, PS: present state of the l<sup>h</sup> sub-stage, NS: next state of the l<sup>h</sup> sub-stage, RS: state line buffers). In Stage I, all three masking effects (L, E, LW: logical, electrical and latching-window masking, respectively) are modeled, while in Stage II only logical masking (L) needs to be considered. ``` STAGE I: STAGE II – unrolling ompute initital probabilities { ompute final probabilities (k){ set technology parameters; create k-unrolled circuit gate netlist; parse input netlist; sort gates topologically: pass through the sorted list, create all BDDs create gate node list sort gates topologically; pass through the sorted list, create all ADD: for each output { for each state-line for each output and each next state line compute the probability of error;//condit. for each gate and each state-line compute final probability of error; //eq.(4) compute the probability of error://eq. (2 ``` Fig. 6. The algorithm for Stage I initial error probability computation and Stage II final error probability computation. the probability of error at state line l. It is important to note here that we need to assume only a hit in the Stage I of the unrolled circuit and no hits in the consecutive cycles. According to [2],[5], particle hits are sufficiently rare and therefore this assumption is realistic. The probability $P(F_j^{k,d_{mu},d_{mu}})$ can be averaged across input probability distributions to find MES as in equation (1). As described in [5], the MES value can further be used to find the probability $F_j^k$ of output j failing at sub-stage k and then to compute SER as in equation (2). We note that, in Stage I, a single pulse can result in an error on more than one state line. An exact approach would be to use the global state vector probability distribution and take into account the correlation of errors on state lines, instead of using individual state-line probability distribution. Obviously, the assumption we make leads to an approximation of output error probability estimation. However, it has been suggested [10] that accurate results using this approach could be obtained by unrolling the logic an infinitely large number of times. This is impractical, but it has been shown [10] that, for the case of switching activity estimation, unrolling the circuit a finite number of times, k, leads to insignificant approximation error. More specifically, when using k=2, the average error per gate is found to be 2%. In our experiments, we use on average ten unrolled stages for each benchmark and thus, we expect to decrease this error even further. Since the analysis of the circuit that we propose is probabilistic in nature, we use a given initial input vector probability distribution for determining the output error. More specifically, the input vector for Stage II of the unrolled circuit is comprised of inputs $PI^2$ to $PI^k$ to sub-stages 2 to k (which are characterized by the same input probability distribution as $PI^{1}$ ) and $PS^2$ , which are the present state lines after being affected by a possible particle hit in Stage I. The probability distribution characterizing both $PS^1$ and $PS^2$ is determined by steady-state analysis of the original sequential circuit (e.g., using MC analysis as mentioned in Section 3.1 and described in [8],[9]), while any potential state line error probabilities are determined by using the approach described in Section 3.2. Thus, Stage II circuit can now be analyzed for individual latched errors on state lines using the approach in Section 3.2, but only relying on logical masking effects. The algorithm for this approach is given in Fig. 6. ## 6. Experimental results In this section, we first compare the results obtained using MC analysis and HSPICE simulator with the results obtained using our framework on a small example circuit *S27*. Then, we show the results of our symbolic model for seven sequential circuits, given different glitch durations and different sets of input probabilities. The technology used is 70nm, Berkeley Predictive Technology Model [11]. The clock cycle period used is 250ps, and setup and hold times for the latches are assumed to be 10ps each. $V_{dd}$ is assumed to be 1V. The delay of an inverter in the given technology is determined by simulating a ring oscillator in HSPICE and found to be 6.5ps. The delays for other gates are found by using logical and electrical effort methodology [12]. The benchmark circuits are chosen from ISCAS'89 suite. The symbolic modeling framework is implemented in C++, and run on a 3GHz Pentium 4 workstation running Linux. **Table 1.** Comparison of number of steps to reach the (approximate) steady state and error relative to MC analysis for circuit unrolling. | | MC analysis | unrolling | |--------------------|-------------|-----------| | no. steps | 163 | 10 | | relative error [%] | 0 | 3E-6 | ## 6.1. MC analysis vs. circuit unrolling We compared the MC analysis (power method) with the unrolling of sequential circuits on benchmark *S27* for ten different input probability distributions. In Table 1, we show the maximum number of steps needed for the power method applied on transition matrix of the modified circuit ( $P^{nodified}$ ) to converge to the steady-state distribution (column "MC analysis"), and the number of sub-stages of the circuit in the unrolling method (column "unrolling") needed to reach a *SER* value smaller than a given threshold ( $10^{-7}$ FIT). We also show the error in the steady-state probability distribution for the proposed unrolling method when compared to the MC analysis method. As it can be seen, circuit unrolling provides sufficiently accurate results, with one order of magnitude less time complexity. #### 6.2. Symbolic modeling vs. simulation We use HSPICE simulation to evaluate the accuracy of the results we obtain using approximate symbolic model of the circuit. In Fig. 7, we show the relative error and relative speedup of our model when compared to the HSPICE simulation for benchmark circuit *S27* for several initial glitch durations ranging from 40ps to 120ps, assuming exhaustive input sets and considering all gate-output pairs. We find the relative error of our model for a given initial glitch size as: $$relative\_error = \frac{\sum\limits_{k=1}^{n_{r}}\sum\limits_{i=1}^{n_{G}}\sum\limits_{j=1}^{n_{F}}\left|D_{symbolic}^{ijk} - D_{HSPICE}^{ijk}\right|/D_{HSPICE}^{ijk}}{n_{G}\cdot n_{F}\cdot n_{V}}$$ where $n_G$ is the number of gates as in equation (1), $n_F$ is the number of outputs, $n_V$ is the number of input vectors, $D^{ijk}_{symbolic}$ and $D^{ijk}_{HSPICE}$ are the durations of the glitch for input vector k and the gate-output pair $G_i$ - $F_j$ , found using our model and HSPICE, respectively. Note that this error includes a *node-by-node* analysis and not just a lumped SER comparison. As it can be seen from Fig. 7, the error stemming from the approximate gate delay model and the attenuation model we are using ranges from less than 1% to about 12% in one instance (40ps glitch), while averaging 4% **Fig. 7.** Comparison of results obtained from HSPICE simulation and symbolic method on benchmark circuit *S27*. **Fig. 8.** SER changes in circuits S444 and S1196 during five clock cycles for different input probability distributions. overall for an effective **5500X** average speedup (up to **11000X** in some cases). ## 6.3. SER evaluation In Table 2, we present SER for several ISCAS'89 benchmark circuits. The allowed interval for the initial duration of the glitch is assumed to be $(d_{\min}, d_{\max}) = (60, 140)$ ps, while initial amplitude is in the range $(a_{\min}, a_{\max}) = (0.8, 1)V$ . Since for glitches smaller than 60ps all benchmark circuits (except for a few that have a very small number of gates) have output error induced mostly by output gates and their fanin gates in Stage I, we use this duration as the lower bound of our interval. Similarly, as already explained, for glitches longer than 140ps, all benchmarks propagate almost all the glitches, and thus we use this as an upper bound. MES for each output is found within these allowed intervals at incremental steps $\Delta d = 20$ ps and $\Delta a = 0.1$ V. The $R_{PH}$ used is 56.5 m<sup>-2</sup>s<sup>-1</sup>, $R_{eff}$ is 2.2·10<sup>-5</sup>, and the total silicon area for each benchmark circuit is derived as a function of gate count. The SER values are computed for each output as described in Section 3.2. The minimum, maximum and average values presented in Table 2 are then found across all output SER values across all sub-stages for a given circuit. As it can be seen, the SER behavior is different among various benchmark circuits, that is, the SER decreases very fast (e.g., for circuits S1196, S1238) or stays at about the same level for all ten clock cycles for which the circuit is unrolled (e.g., for circuit \$208). This difference in the number of cycles needed for the SER values to dissipate stems from the different functionality and logical masking behavior of circuits under considerations, as well as from the number of state lines that can drive errors back to the state line inputs of the circuit. In case of benchmarks for which SER remained at about the same level, the unrolling was terminated when the difference between the SER values in two consecutive cycles was less than 10<sup>-7</sup> FIT. The results for one small benchmark *S444* (153 gates, 3 inputs) and one larger benchmark, *S1196* (487 gates, 14 inputs) are presented in Fig. 8. As it can be seen from Fig. 8, both circuits converge to steady-state in five clock cycles after the hit. The only difference between these two circuits is the magnitude of *SER* ## 7. Conclusion In this paper, we presented a symbolic modeling methodology for efficient estimation of the soft error susceptibility of a sequential circuit. We have demonstrated the efficiency of our method by comparing it to HSPICE detailed circuit simulation and applying it on a subset of *ISCAS'89* benchmarks of various complexities. For the future work, we plan to extend this framework such that it can be applied to analyze the *SER* mitigation techniques for sequential circuits. **Table 2.** Minimum, maximum and average *SER* for the range of glitch durations, average number of clock cycles needed to reach the steady-state and number of stages and number of probability distributions (PDs) used, run time and memory usage for three glitch durations: small (60ps), medium (100ps) and large (140ps). | | | _ | | | (0.0 | /, | | / | | and lang | - ( - | / . | |--------|--------------|-----|----|------------|-----------|---------|---------|---------------|-------------------|----------|-----------------|------| | Bench. | no.<br>gates | - 1 | | no.<br>NSs | SER [FIT] | | | glitch | no.<br>sub-stages | run | memory<br>usage | | | | | | | | min | max | average | no.<br>cycles | size | & PDs | (s) | (MB) | | S27 | 10 | 4 | 1 | 3 | 3e-8 | 0.00397 | 0.00060 | 10 | small | 10,10 | 0.027 | 1.4 | | | | | | | | | | | medium | 10,10 | 0.028 | 57.4 | | | | | | | | | | | large | 10,10 | 0.032 | 57.4 | | S208 | 68 | 10 | 1 | 8 | 0.00192 | 0.00303 | 0.00243 | 10+ | small | 10,10 | 994 | 61 | | | | | | | | | | | medium | 10,10 | 1000 | 67.2 | | | | | | | | | | | large | 10,10 | 1000 | 67.2 | | S298 | 86 | 3 | 14 | 14 | 1.67e-7 | 0.00344 | 0.00148 | 10+ | small | 10,10 | 6900 | 71.4 | | | | | | | | | | | medium | 10,10 | 6900 | 71.4 | | | | | | | | | | | large | 10,10 | 6950 | 71.4 | | S444 | 153 | 3 | 2 | 21 | 0 | 2.43e-5 | 3.11e-6 | 5 | small | 5,10 | 385 | 61.8 | | | | | | | | | | | medium | 5,10 | 365 | 61.8 | | | | | | | | | | | large | 5,10 | 360 | 61.8 | | S526 | 165 | 3 | 21 | 21 | 6.76e-7 | 0.00200 | 0.00071 | 5+ | small | 5,10 | 570 | 14.3 | | | | | | | | | | | medium | 5,10 | 551 | 18.0 | | | | | | | | | | | large | 5,10 | 550 | 18.0 | | S1196 | 487 | 14 | 13 | 18 | 0 | 0.00240 | 0.00021 | 5 | small | 5,10 | 57 | 17.4 | | | | | | | | | | | medium | 5,10 | 68 | 20.0 | | | | | | | | | | | large | 5,10 | 61 | 20.5 | | S1238 | 540 | 14 | 13 | 18 | 0 | 0.00214 | 0.00050 | 4 | small | 5,10 | 64 | 15.1 | | | | | | | | | | | medium | 5,10 | 70 | 15.9 | | | | | | | | | | | large | 5,10 | 71 | 15.9 | ## 8. References - [1] S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim. *Robust System Design with Built-In Soft-Error Resilience*. In IEEE Computer Magazine, Vol. 28, No. 2, pp. 43-52, February 2005. - [2] R. C. Baumann. *Soft Errors in Advanced Computer Systems*. In IEEE Design and Test of Computers, Vol. 22, Issue 3, 2005. - [3] P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi. *Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic*. In Proc. of International Conference on Dependable Systems and Networks, pp. 389-398, 2002. - [4] J. F. Ziegler *et al. IBM experiments in Soft Fails in Computer Electronics (1978-1994)*. In IBM Journal of Research and Development, Vol 40, pp. 3-18, 1996. - [5] N. Miskov-Zivanov, D. Marculescu. MARS-C: Modeling and Reduction of Soft Errors in Combinational Circuits. In Proc. of Design Automation Conference (DAC'06), pp. 767-772, July 2006. - [6] G. Asadi and M. B. Tahoori. *Soft Error Modeling and Protection for Sequential Elements*. In Proc. of IEEE Symposium on Defect and Fault Tolerance (DFT) in VLSI Systems, pp. 463-471, October 2005. - [7] P. Dodd. Basic Mechanisms and Modeling of Single-Event Upset in Digital Microelectronics. In Proc. of the IEEE Transactions on Nuclear Science, Vol. 50, No. 3, pp. 583-602, June 2003. - [8] D. Marculescu, R. Marculescu, and M. Pedram. *Trace-Driven Steady-State Probability Estimation in FSMs with Application to Power Estimation*. In Proc. of IEEE Design, Automation and Test in Europe Conf. (DATE), February 1998. - [9] G. D. Hachtel, E. Macii, A. Pardo, and F. Somenzi. *Markovian Analysis of Large Finite State Machines*. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 15, No. 12, pp. 1479-1493, December 1996. - [10]C. Y. Tsui, J. Monteiro, M. Pedram, S. Devadas, and A. M. Despain. *Power Estimation Methods for Sequential Logic Circuits*. In Proc. of the IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 3, No. 3, pp. 404-416, September 1995. - [11] Berkeley Predictive Technology Model (BPTM): http://www-device.eecs.berkeley.edu/~ptm. - [12]I. Sutherland, B. Sproull and D. Harris. *Logical Effort: Designing Fast CMOS Circuits*. Morgan Kaufmann Publishers, Inc., pp.5-15, pp. 63-73,1999.