scispace - formally typeset
Search or ask a question

Showing papers by "Cristiana Bolchini published in 2010"


Journal ArticleDOI
TL;DR: This letter proposes a classification algorithm to discriminate between recoverable and not recoverable faults occurring in static random access memory (SRAM)-based field-programmable gate arrays (FPGAs) to enable the exploitation of these devices also in space applications, typically characterized by long mission times, where permanent faults become an issue.
Abstract: This letter proposes a classification algorithm to discriminate between recoverable and not recoverable faults occurring in static random access memory (SRAM)-based field-programmable gate arrays (FPGAs), with the final aim of devising a methodology to enable the exploitation of these devices also in space applications, typically characterized by long mission times, where permanent faults become an issue. By starting from a characterization of the radiation effects and aging mechanisms, we define a controller able to classify such faults and consequently to apply the appropriate mitigation strategy.

32 citations


Proceedings ArticleDOI
06 Oct 2010
TL;DR: The Reconfiguration Controller, focus of this paper, is the main component in charge of implementing such strategy, and the identification of a distributed control architecture, allowing the avoidance of single points of failure.
Abstract: This paper proposes the design of a controller managing the fault tolerance of multi-FPGA platforms, contributing to the creation of a reliable system featuring high flexibility and resource availability. A fault management strategy that exploits the devices' reconfiguration capabilities is proposed, the Reconfiguration Controller, focus of this paper, is the main component in charge of implementing such strategy. The innovative points raised by this work are 1) the identification of a distributed control architecture, allowing the avoidance of single points of failure, 2) the management of both recoverable and non-recoverable faults, and 3) the definition of an overall reliable multi-FPGA system.

18 citations


Proceedings ArticleDOI
06 Oct 2010
TL;DR: An enhanced system-level synthesis flow for the design of reliable embedded systems is proposed, extending the classical process to introduce fault mitigation properties in the design under consideration, thus obtaining the hardened implementation.
Abstract: This paper proposes an enhanced system-level synthesis flow for the design of reliable embedded systems, extending the classical process to introduce fault mitigation properties in the design under consideration. The strategy first explores the adoption of hardening techniques that, given the initial task graph and the user's reliability requirements, introduce redundancies and mapping constraints on the available resources, which possibly expose fault detection/tolerance features. The reliability-aware task graph is then implemented by means of a classical mapping and scheduling approach thus obtaining the hardened implementation. Experimental results are reported to support the validity of the proposal.

16 citations


Proceedings ArticleDOI
24 May 2010
TL;DR: The framework integrates three strategies independently designed to tackle the problem of SEUs, using an enhanced TMR-based technique, coupled with partial dynamic reconfiguration and a robustness analysis to identify possible TMR failures.
Abstract: This paper presents an enhanced design flow for the implementation of hardened systems on SRAM-based FPGAs, able to cope with the occurrence of Single Event Upsets (SEUs). The framework integrates three strategies independently designed to tackle the problem of SEUs; first a systematic methodology is used to harden the circuit exploiting an enhanced TMR-based technique, coupled with partial dynamic reconfiguration. Then, a robustness analysis is performed to identify possible TMR failures, eventually solved by a specific local re-design of the critical portions of the implementation. We present the overall flow and the benefits of the solution, experimentally evaluated on a realistic circuit.

8 citations


Proceedings ArticleDOI
18 Jul 2010
TL;DR: This paper presents a framework for the design space exploration of reliable FPGA systems based on a multi-objective genetic algorithm (NSGA-II) that takes into account several design metrics and outputs a set of Pareto-optimal design solutions.
Abstract: This paper presents a framework for the design space exploration of reliable FPGA systems based on a multi-objective genetic algorithm (NSGA-II). The framework takes into account several design metrics and outputs a set of Pareto-optimal design solutions. The framework is compared to the multi-objective version of simulated annealing (AMOSA) and it is empirically studied in terms of scalability using three real-world circuits and a set of synthetic problems of different sizes. Our results show that the proposed approach generates a rich set of Pareto-optimal solutions whereas AMOSA tends to find suboptimal solutions. Our empirical scalability analysis shows that, while the problem space is exponential in the number n of functional units constituting the system, the number of evaluations required by our framework grows as O(n3.6).

8 citations


Proceedings ArticleDOI
01 Nov 2010
TL;DR: A quantitative metric for the evaluation of diagnostic resolution of a test set is proposed, together with an algorithm for the minimal extension of a given test set in order to provide a complete discrimination of failures affecting a system, to be used as a support for analysts during the definition of a testing framework.
Abstract: Fault diagnosis is the task of identifying a faulty component in a complex system using data collecting from a test section. Diagnostic resolution, that is the ability to discriminate a faulty component in a set of possible candidates, is a property that the system model must expose to provide accuracy and robustness in the diagnosis. Such a property depends on the selection of an appropriate test set capable to provide a unique interpretation of the test outcomes. In this paper a quantitative metric for the evaluation of diagnostic resolution of a test set is proposed, together with an algorithm for the minimal extension of a given test set in order to provide a complete discrimination of failures affecting a system, to be used as a support for analysts during the definition of a testing framework.

7 citations


Proceedings ArticleDOI
01 Sep 2010
TL;DR: This paper focuses on the evolution of the BBN nodes probabilities, to define a stop criterion to interrupt the diagnosis process when additional test outcomes would not provide further useful information for identifying the faulty candidate.
Abstract: iAF2D (incremental Automatic Functional Fault Detective) is a methodology for the identification of the faulty component in a complex system using data collected from a test session. It is an incremental approach based on a Bayesian Belief Network, where the model of the system under analysis is extracted from a faulty signature description. iAF2D reduces time, cost and efforts during the diagnostic phase by implementing a step-by-step selection of the tests to be executed from the set of available tests. This paper focuses on the evolution of the BBN nodes probabilities, to define a stop criterion to interrupt the diagnosis process when additional test outcomes would not provide further useful information for identifying the faulty candidate. Methodology validation is performed on a set of experimental results.

6 citations


Journal ArticleDOI
TL;DR: This work proposes the Esteem approach (Emergent Semantics and cooperaTion in multi-knowledgE EnvironMents), where a comprehensive framework and a platform for data and service discovery in P2P systems are proposed, with advanced solutions for trust and quality-based data management, P1P infrastructure definition, query processing and dynamic service discovery on a context-aware scenario.
Abstract: In the present global society, information has to be exchangeable in open and dynamic environments, where interacting users do not necessarily share a common understanding of the world at hand. This is particularly true in P2P scenarios, where millions of autonomous users (peers) need to cooperate by sharing their resources (such as data and services). We propose the Esteem approach (Emergent Semantics and cooperaTion in multi-knowledgE EnvironMents), where a comprehensive framework and a platform for data and service discovery in P2P systems are proposed, with advanced solutions for trust and quality-based data management, P2P infrastructure definition, query processing and dynamic service discovery in a context-aware scenario. In Esteem, semantic communities are built around declared interests in the form of manifesto ontologies and their autonomous nature is preserved by allowing a shared semantics to naturally emerge from the peer interactions. Inside the borders of semantic communities data and services are discovered, queried and invoked in a resource sharing scenario, where the context in which users interoperate and the trust of exchanged information are also relevant aspects to take into account.

6 citations


Proceedings ArticleDOI
06 Oct 2010
TL;DR: This paper focuses on the evolution of the Bayesian Belief Network nodes probabilities, presenting some selection heuristics to reduce the number of required tests.
Abstract: Incremental Automatic Functional Fault Detective is an incremental methodology based on a Bayesian Belief Network for the identification of the faulty component in a complex system, using data collected from a test session. Incremental Automatic Functional Fault Detective reduces time, cost and efforts during the diagnostic phase by implementing a step-by-step selection of the tests to be executed from the set of available tests. This paper focuses on the evolution of the Bayesian Belief Network nodes probabilities, presenting some selection heuristics to reduce the number of required tests. Validation is performed on a set of experimental results.

2 citations


Journal ArticleDOI
TL;DR: This special section on System-Level Design of Reliable Architectures to the audience of the IEEE Transactions on Computers is introduced, and six papers have been selected covering a wide spectrum of topics ranging from architectural fault-tolerant techniques to formal methodologies for reliability analysis.
Abstract: IT is with great pleasure that we introduce this special section on System-Level Design of Reliable Architectures to the audience of the IEEE Transactions on Computers. Six papers have been selected covering a wide spectrum of topics ranging from architectural fault-tolerant techniques to formal methodologies for reliability analysis. These papers are authored by relevant researchers in the field and cover theoretical and experimental topics. The widespread use of electronics in our life is directing more and more attention to the reliability properties of such systems in order to preserve both user’s and environmental safety; therefore, the design of reliable architectures is today a necessity rather than an option, even in not-critical application domains. At the same time, these systems are reaching high complexity levels, thus leading the designer to both develop specific components and to use and compose existing ones to achieve the desired overall functionality. In the former case, ad hoc techniques may be devised, acting on either the hardware or the software to cope with the occurrence of faults. In this latter situation, when combining independently designed modules, the enhancement and assessment of reliability becomes particularly important; for instance, specific approaches are required to be able both to apply fault detection/tolerance techniques from the initial steps of the design flow and to evaluate the effects of faults in a component while interacting with the other ones composing the overall system. As a result, the entire design flow needs to be enhanced to support reliability: from the initial modelling of the system together with the desired properties/requirements, to the fault model, from the hardware/ software partitioning step to the subsequent design exploration phase, where the more traditional metrics covering performance, costs, and power consumption need to be modified to also weight fault detection/tolerance capabilities. Functional verification and reliability analysis constitute two other aspects of this scenario to assess the quality of the designed system in terms of correctness and its ability to deal with failures. In this scenario, new advances have been achieved in all the relevant issues pertaining the system-level design of reliable systems, to support the designers in the development of innovative architectures able to cope with the occurrence of failures. Such advances lead to the definition of both new methodologies, as well as, of new architectures. Furthermore, based on the application environment in which the system will be adopted, different classes of reliability might be necessary; in some situations it is possible to achieve an autonomous fault detection capability, whereas, in critical environments, fault effects need to be completely masked, thus providing fault tolerance properties. The six papers presented in this special section were selected to address the different aspects of the important challenges related to the system level design of reliable systems. They cover all various facets of the issue, offering interesting solutions to tackle the specific problems. The first two papers deal with reliability analysis, which has become a fundamental tool to computer engineers for the validation of the design of hardened system architectures, in particular in safety and mission critical domains, such as medicine, military and transportation. The first paper is entitled “Formal Reliability Analysis Using Theorem Proving” by Osman Hasan, Sofiene Tahar, and Naeem Abbasi. This paper addresses an important aspect of reliability analysis, attempting to introduce formal verification instead of simulation-based and probabilistic approaches to assess the fault tolerance characteristics of the designed systems. The authors propose to conduct a formal reliability analysis of systems within the framework of a higher-order-logic theorem prover. In this paper, they present the higher-orderlogic formalization of some fundamental reliability theory concepts, which can be built upon to precisely analyze the reliability of various engineering systems. The proposed formalization is then applied to analyze the repairability conditions for a reconfigurable memory array in the presence of stuck-at and coupling faults. Still within the context of reliability analysis, the second paper, entitled “Efficient Microarchitectural Vulnerabilities Prediction Using Boosted Regression Trees and Patient Rule Inductions,” by Bin Li, Lide Duan, and Lu Peng, deals with Architectural Vulnerability Factor (AVF) analysis, which reflects the possibility that a transient fault eventually causes a visible error in the program output, and it indicates a system’s susceptibility to transient faults. This metric is increasingly being adopted to evaluate microprocessor’s architectures, due to their high vulnerability to transient faults, derived from shrinking feature sizes, threshold voltage, and increasing frequency. The authors propose an innovative way to predict the architectural vulnerability factor using Boosted Regression Trees, a nonparametric tree-based predictive modeling scheme, to identify the correlation across workloads, execution phases, and processor configurations, between the estimated AVF of a key processor structure and various performance metrics. The next two papers deal with fault detection techniques for different architectural components. The first paper is entitled “Concurrent Structure-Independent Fault Detection Schemes for the Advanced Encryption Standard,” authored by Mehran Mozaffari-Kermani and Arash Reyhani-Masoleh. IEEE TRANSACTIONS ON COMPUTERS, VOL. 59, NO. 5, MAY 2010 577