scispace - formally typeset
Search or ask a question

Showing papers by "Luca Sterpone published in 2014"



Journal ArticleDOI
TL;DR: A new hybrid technique that monitors the control flow at both points and compares them to detect possible errors is proposed, and shows full control-flow error detection with no performance degradation and small area overhead.
Abstract: Hybrid error-detection techniques combine software techniques with an external hardware module that monitor st he execution of a microprocessor. The external hardware module typ- ically observes the control flow at the input or at the output of the microprocessor and compares it with the expected one. This paper proposes a new hybrid technique that monitors the control flow at both points and compares them to detect possible errors. The proposed approach does not require any software modifica- tion to detect control-flow errors. Fault-injection campaigns have been performed on an LEON3 microprocessor. The results show full control-flow error detection with no performance degradation and small area overhead. A complete solution can be obtained by complementing the proposed approach with software fault-toler- ance techniques for data errors.

18 citations


Journal ArticleDOI
TL;DR: Results show that this method is able to exploit the intrinsic parallelism of the VLIW processor, taming the growth in size, and duration of the test program when the processor size grows.
Abstract: Very long instruction word (VLIW) processors are increasingly employed in a large range of embedded signal processing applications, mainly due to their ability to provide high performances with reduced clock rate and power consumption. At the same time, there is an increasing request for efficient and optimal test techniques able to detect permanent faults in VLIW processors. Software-based self-test (SBST) methods are a consolidated and effective solution to detect faults in a processor both at the end of the production phase or during the operational life; however, when traditional SBST techniques are applied to VLIW processors, they may prove to be ineffective (especially in terms of size and duration), due to their inability to exploit the parallelism intrinsic in these architectures. In this paper, we present a new method for the automatic generation of efficient test programs specifically oriented to VLIW processors. The method starts from existing test programs based on generic SBST algorithms and automatically generates effective test programs able to reach the same fault coverage, while minimizing the test duration and the test code size. The method consists of four parametric phases and can deal with different VLIW processor models. The main goal of the paper is to show that in the case of VLIW processors, it is possible to automatically generate an effective test program able to achieve high fault coverage with minimal test time and required resources. Experimental data gathered on a case study demonstrate the effectiveness of the proposed approach; results show that this method is able to exploit the intrinsic parallelism of the VLIW processor, taming the growth in size, and duration of the test program when the processor size grows.

18 citations


Journal ArticleDOI
TL;DR: A simulator of soft errors in the configuration memory of SRAM-based FPGAs, named ASSESS, adopts fault models for SEUs affecting the configuration bits controlling both logic and routing resources that have been demonstrated to be much more accurate than classical fault models adopted by academic and industrial fault simulators currently available.
Abstract: In this paper a simulator of soft errors (SEUs) in the configuration memory of SRAM-based FPGAs is presented. The simulator, named ASSESS, adopts fault models for SEUs affecting the configuration bits controlling both logic and routing resources that have been demonstrated to be much more accurate than classical fault models adopted by academic and industrial fault simulators currently available. The simulator permits the propagation of faulty values to be traced in the circuit, thus allowing the analysis of the faulty circuit not only by observing its output, but also by studying fault activation and error propagation. ASSESS has been applied to several designs, including the miniMIPS microprocessor, chosen as a realistic test case to evaluate the capabilities of the simulator. The ASSESS simulations have been validated comparing their results with a fault injection campaign on circuits from the ITC'99 benchmark, resulting in an average error of only 0.1%.

18 citations


Proceedings ArticleDOI
07 Jul 2014
TL;DR: An experimental validation of an updated version of analytical approach to predict Single Event Effects (SEEs) based on the analysis of the circuit the FPGA implements by comparing its results with a fault injection campaign.
Abstract: Predicting soft errors on SRAM-based FPGAs without a wasteful time-consuming or a high-cost has always been a very difficult goal. Among the available methods, we proposed an updated version of analytical approach to predict Single Event Effects (SEEs) based on the analysis of the circuit the FPGA implements. In this paper, we provide an experimental validation of this approach, by comparing the results it provides with a fault injection campaign. We adopted our analytical method for computing the error-rate of a design implemented on SRAM-based FPGA. Furthermore, we compared the obtained soft-error figure with the one measured by fault injection. Experimental analysis demonstrated the analytical method closely match the effective soft-error rates becoming a viable solution for the soft-error estimation at early design phases.

15 citations


Proceedings ArticleDOI
26 May 2014
TL;DR: The content of the paper is focused on analyzing design features, fail-safe and reconfigurable features oriented to self-adaptive mitigation and redundancy approaches applied during the design phase, and experimental results reporting a clear status of the test data and fault tolerance robustness are reported.
Abstract: Reconfigurable architectures are increasingly employed in a large range of embedded applications, mainly due to their ability to provide high performance and high flexibility, combined with the possibility to be tuned according to the specific task they address. Reconfigurable systems are today used in several application areas, and are also suitable for systems employed in safety-critical environments. The actual development trend in this area is focused on the usage of the reconfigurable features to improve the fault tolerance and the self-test and the self-repair capabilities of the considered systems. The state-of-the-art of the reconfigurable systems is today represented by Very Long Instruction Word (VLIW) processors and reconfigurable systems based on partially reconfigurable SRAM-based FPGAs. In this paper, we present an overview and accurate analysis of these two type of reconfigurable systems. The content of the paper is focused on analyzing design features, fail-safe and reconfigurable features oriented to self-adaptive mitigation and redundancy approaches applied during the design phase. Experimental results reporting a clear status of the test data and fault tolerance robustness are detailed and commented.

14 citations


Proceedings ArticleDOI
26 May 2014
TL;DR: This tutorial is to present and discuss different solutions currently available for assessing and implementing the fault tolerance of digital circuits, not only when the unique design description is provided but also at the component level, especially when Commercial-of-the-shelf (COTS) devices are selected.
Abstract: Traditionally, heavy ion radiation effects affecting digital systems working in safety critical application systems has been of huge interest Nowadays, due to the shrinking technology process, Integrated Circuits became sensitive also to other kinds of radiation particles such as neutron that can exist at the earth surface and affects ground-level safety critical applications such as automotive or medical systems The process of analyzing and hardening digital devices against soft errors implies rising the final cost due to time expensive fault injection campaigns and radiation tests, as well as reducing system performance due to the insertion of redundancy-based mitigation solutions The main industrial problem arising is the localization of the critical elements in the circuit in order to apply optimal mitigation techniques The proposal of this tutorial is to present and discuss different solutions currently available for assessing and implementing the fault tolerance of digital circuits, not only when the unique design description is provided but also at the component level, especially when Commercial-of-the-shelf (COTS) devices are selected

11 citations


Proceedings ArticleDOI
26 May 2014
TL;DR: It is demonstrated that the proposed design flow is able to decrease the circuits sensitivity versus SEE by two orders of magnitude with a reduction of resource overhead of 83 % with respect to traditional mitigation approaches.
Abstract: In the present paper, we propose a new design flow for the analysis and the implementation of circuits on Flash-based FPGAs hardened against Single Event Effects (SEEs). The solution we developed is based on two phases: 1) an analyzer algorithm able to evaluate the propagations of SETs through logic gates; 2) a hardening algorithm able to place and route a circuit by means of optimal electrical filtering and selective guard gates insertions. The effectiveness of the proposed design flow has been evaluated by performing hardening on seven benchmark circuits and comparing the results using different implementation approaches on 130nm Flash-based technology. The obtained results have been validated against radiation-beam testing using heavy-ions and demonstrated that our solution is able to decrease the circuits sensitivity versus SEE by two orders of magnitude with a reduction of resource overhead of 83 % with respect to traditional mitigation approaches.

11 citations


Journal ArticleDOI
TL;DR: This paper analyzes through neutron irradiation typical parallel algorithms for embedded GPGPUs and it is demonstrated that during a FFT execution most errors appear in the stages in which the GPG PU is completely loaded as the number of instantiated parallel tasks is higher.
Abstract: Thanks to the capability of efficiently executing massive computations in parallel, General Purpose Graphic Processing Units (GPGPUs) have begun to be preferred to CPUs for several parallel applications in different domains. Two are the most relevant fields in which, recently, GPGPUs have begun to be employed: High Performance Computing (HPC), and embedded systems. The reliability requirements are different in these two applications domain. In order to be employed in safety-critical applications, GPGPUs for embedded systems must be qualified as reliable. In this paper, we analyze through neutron irradiation typical parallel algorithms for embedded GPGPUs and we evaluate their reliability. We analyze how caches and threads distributions affect the GPGPU reliability. The data have been acquired through neutron test experiments, performed at the VESUVIO neutron facility at ISIS. The obtained experimental results show that, if the L1 cache of the considered GPGPU is disabled, the algorithm execution is most reliable. Moreover, it is demonstrated that during a FFT execution most errors appear in the stages in which the GPGPU is completely loaded as the number of instantiated parallel tasks is higher

8 citations


Journal ArticleDOI
TL;DR: The results demonstrate an evident reduction of the recovery time due to fast error detection time and selective partial reconfiguration of faulty domains, and the methodology drastically reduces Cross-Domain Errors in Look-Up Tables and routing resources.
Abstract: The rapid adoption of FPGA-based systems in space and avionics demands dependability rules from the design to the layout phases to protect against radiation effects. Triple Modular Redundancy is a widely used fault tolerance methodology to protect circuits against radiation-induced Single Event Upsets implemented on SRAM-based FPGAs. The accumulation of SEUs in the configuration memory can cause the TMR replicas to fail, requiring a periodic write-back of the configuration bit-stream. The associated system downtime due to scrubbing and the probability of simultaneous failures of two TMR domains are increasing with growing device densities. We propose a methodology to reduce the recovery time of TMR circuits with increased resilience to Cross-Domain Errors. Our methodology consists of an automated tool-flow for fine-grain error detection, error flags convergence and non-overlapping domain placement. The fine-grain error detection logic identifies the faulty domain using gate-level functions while the error flag convergence logic reduces the overwhelming number of flag signals. The non-overlapping placement enables selective domain reconfiguration and greatly reduces the number of Cross-Domain Errors. Our results demonstrate an evident reduction of the recovery time due to fast error detection time and selective partial reconfiguration of faulty domains. Moreover, the methodology drastically reduces Cross-Domain Errors in Look-Up Tables and routing resources. The improvements in recovery time and fault tolerance are achieved at an area overhead of a single LUT per majority voter in TMR circuits.

7 citations


Proceedings ArticleDOI
20 Oct 2014
TL;DR: The proposed approach exploits run-time partial reconfiguration techniques for fault injection and avoids full net-list re-compilations and the method's feasibility is assessed through carefully selected circuits and overhead in terms of area and timing.
Abstract: Hardware fault emulation for Application Specific Integrated Circuits (ASICs) on FPGAs can considerably reduce the time required for the fault simulation. This paper presents a methodology to emulate ASIC faults on state-of-the-art FPGAs. The fault emulation is achieved by following a fully automated process consisting of: constrained technology mapping of ASIC net-list; creation of fault dictionary, generation of faulty partial bit-streams and fault emulation. The proposed approach exploits run-time partial reconfiguration techniques for fault injection and avoids full net-list re-compilations. The method's feasibility is assessed through carefully selected circuits and overhead in terms of area and timing is reported.

Proceedings ArticleDOI
07 Jul 2014
TL;DR: This paper focuses on Control Flow Errors (CFEs) and extends a previously proposed method, based on the usage of the debug interface existing in several processors/controllers, that achieves a good detection capability with very limited impact on the system development flow and reduced hardware cost.
Abstract: Transient faults can affect the behavior of electronic systems, and represent a major issue in many safety-critical applications. This paper focuses on Control Flow Errors (CFEs) and extends a previously proposed method, based on the usage of the debug interface existing in several processors/controllers. The new method achieves a good detection capability with very limited impact on the system development flow and reduced hardware cost: moreover, the proposed technique does not involve any change either in the processor hardware or in the application software, and works even if the processor uses caches. Experimental results are reported, showing both the advantages and the costs of the method.

Journal ArticleDOI
TL;DR: This paper presents three new algorithms designed to support radiation experiments aimed at evaluating the radiation sensitivity of GPGPU data caches and shared memory with particular emphasis on the shared memory and on the L1 and L2 data caches.

Proceedings ArticleDOI
01 Oct 2014
TL;DR: A novel technique aimed to further improve the efficiency of the Triple Modular Redundancy (TMR) hardening-technique applied at the software level on VLIW processors is presented.
Abstract: VLIW architectures are widely employed in several embedded signal applications since they offer the opportunity to obtain high computational performances while maintaining reduced clock rate and power consumption. Recently, VLIW processors are being considered for employment in various embedded processing systems, including safety-critical ones (e.g., in the aerospace, automotive and rail transport domains). Terrestrial safety-critical applications based on newer nano-scale technologies raise increasing concerns about transient errors induced by neutrons. Therefore, techniques to effectively estimate and improve the reliability of VLIW processors are of great interest. In this paper, we present a novel technique aimed to further improve the efficiency of the Triple Modular Redundancy (TMR) hardening-technique applied at the software level on VLIW processors. In particular, we first experimentally demonstrate that the TMR-based software technique, when applied at the C code level, is not able to cope with most of the failures affecting user logic resources. Then, we propose a method able to analyze and modify the TMR-based code for a generic VLIW processor in order to improve the fault tolerance of the executed application without modifying the VLIW processor. In details, the proposed technique is able to reduce the number of cross-domain errors affecting the TMR-hardened code of a VLIW processor data path. We provide figures about performance and fault coverage for both the unprotected and protected versions of a set of benchmark applications, thus demonstrating the benefits and limitations of our approach.

Proceedings ArticleDOI
07 Jul 2014
TL;DR: A software debugger-based fault injection mechanism to evaluate the resiliency of applications running on a GPGPU and to validate the software hardening techniques it possibly embeds is proposed.
Abstract: General Purpose Graphic Processing Units (GPGPUs) are more efficient than CPUs for processing parallel data. Unfortunately, GPGPUs are sensible to radiation. Hence, several software mitigation techniques, as well as robust algorithms, are being developed to overcome reliability problems. In this paper we propose a software debugger-based fault injection mechanism to evaluate the resiliency of applications running on a GPGPU and to validate the software hardening techniques it possibly embeds. We report some experimental results gathered on selected case studies to show the proposed approach advantages and limitations.

Proceedings ArticleDOI
26 May 2014
TL;DR: A multi-gigabit bidirectional serial link architecture is implemented by means of Xilinx Virtex 5 FPGAs and the design issues related to the improvement of the radiation tolerance and their impact on the link performance in terms of speed, power consumption, area and latency are presented.
Abstract: High-speed optical links are often used in trigger and data acquisition (TDAQ) systems of high-energy physics (HEP) experiments for data transfer, triggering and fast control distribution. Requirements of system integration, flexibility and re-programmability suggest the use of SERDESes embedded in SRAM-based FPGAs as a communication layer for off-detector electronics. The most attractive link architecture, would deploy the same FPGAs and firmware also on-detector. However, at that side the electronics components need to withstand the expected level of radiation. In this work, we focus on a multi-gigabit bidirectional serial link architecture, which we implemented by means of Xilinx Virtex 5 FPGAs. We present the design issues related to the improvement of the radiation tolerance and their impact on the link performance in terms of speed, power consumption, area and latency. We performed several irradiation tests at INFN Laboratori Nazionali del Sud (Catania, Italy) with a 62-MeV proton beam. We report the results of the irradiation tests and we compare the accumulated configuration errors before failure for each implementation of the design.