Q2. What is the purpose of the correlation coefficient method?
The correlation coefficient method is adopted to obtain error probabilities and correlations of primary outputs due to a particle strike at internal nodes.
Q3. What is the way to track a system failure back to the critical effect?
Based on success trees, a variant of the well-known fault trees, the proposed method not only considers multiple transient and permanent faults concurrently, but a carefully introduced structure of the success tree enables to track a system failure back to the critical effect.
Q4. What is the main reason for the failure of a system?
Over time and with aging effects taking place, more and more components tend to be permanently defective, making permanent effects the dominant source of failure.
Q5. How can the authors increase the number of iterations to regain the reliability of the communication layer?
For a decreased reliability in which voltage drops up to 300 mV occur, the authors can react on the application layer by increasing the number of iterations in order to regain communications performance.
Q6. How did the authors simulate the fault injection experiment?
To make the fault injection experiment feasible the authors used a Mixture Importance Sampling approach to simulate only relevant scenarios.
Q7. What is the probability distribution of injected charges due to a neutron strike?
The authors assume in the following that the probability distribution of injected charges due to a neutron strike follows an exponential distribution [13]:fQ(Qinjected) = 1Qs expQinjectedQs!(3)The parameter Qs is the charge collection slope due to one neutron strike, which is technology dependent [10].
Q8. What is the probability of arithmetic operations stored in an SRAM memory array?
When operand variables of arithmetic operations are stored in an SRAM memory array, then Pword(~x, t) describes the probability with which these variables contain erroneous data.
Q9. What is the effect of hardware errors on the system performance?
In [35], the authors studied the effects of hardware errors in the system memories of a MIMO-BICM receiver on the system’s communications performance because the memories consume a large amount of the systems area.
Q10. How does the probability of a faulty data word be calculated?
According to Fig. 12, the probability that a faulty data word is read from the cache decreases as expected, while the overall system failure probability only slightly decreases compared to the unprotected cache.
Q11. How many ms will the optimization algorithm suggest?
In this case, the optimization algorithm presented in [30] will suggest a solution of s = 8 and a time window of TTW = 1.30 ms which will just meet the aforementioned demand.
Q12. What is the probability of a bit flip in an SRAM cell?
The authors will show in this section how the authors can model the bit flip probabilities in an SRAM array by using the generic model from Section 3.A bit flip in an SRAM cell occurs for example when a particle strike induces enough charge on a point within the cell to cause a flip in the cell’s content.
Q13. What is the error probability for each component of the pipeline?
The error probability PE(c) for each pipeline component c is obtained using the HW-level reliability methods like EPP [24], CEP [25] and CLASS [26].