Classification of Resilience Techniques Against Functional Errors at Higher Abstraction Layers of Digital Systems
read more
Citations
Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists.
Predicting the compressive strength of concrete with fly ash admixture using machine learning algorithms
Accumulating regional density dissimilarity for concept drift detection in data streams
Screen Content Quality Assessment: Overview, Benchmark, and Beyond
Cross-Domain Authorship Attribution Using Pre-trained Language Models
References
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
Soft digital signal processing
Imprecise computations
Integration of Distributed Enterprise Applications: A Survey
The design and implementation of Berkeley Lab's linuxcheckpoint/restart
Related Papers (5)
Measuring Presence in Virtual Environments: A Presence Questionnaire
A framework for immersive virtual environments five: Speculations on the role of presence in virtual environments
Frequently Asked Questions (17)
Q2. What are the future works in "Xx classification of resilience techniques against functional errors at higher abstraction layers of digital systems" ?
The most prominent being that mapping and SW provides a lot of flexibility due to the re-mapping possibilities of a given task sequence onto the “ fixed ” HW. Networked applications expanded further the deliverable functionality possibilities. The system behavior can be adapted at run time whenever significant environmental changes take place, or according to varying error rates. This is especially so, as errors can be masked as they propagate through the different hardware and software layers ( including the application itself ).
Q3. What are the cons of a checkpointing scheme?
Cons include latency (depending on the checkpointing granularity), performance (depending also on whether checkpointing is overlapped with normal execution) and the limitation to transient errors.
Q4. What are the cons of a hybrid?
Cons include the need for system-specific solutions, the low error protection (through isolation), the potential performance degradation.
Q5. What are the pros and cons of checking a system?
Cons include the potentially high storage and power overhead, the potentially very high latency and performance (depending also on whether checkpointing is overlapped with normal execution).
Q6. What are the challenges and opportunities for the fault tolerance techniques?
Further technology trends like 3D integration, incorporating heterogeneous technologies on a single platform and dark silicon pose new challenges and opportunities for the fault tolerance techniques.
Q7. What are some examples of emerging error-tolerant application domains?
Other examples of emerging error-tolerant application domains are Recognition, Mining and Synthesis (RMS) [Dubey 2005] as well as artificial neural networks (ANNs) [Temam 2012].
Q8. What are the pros and cons of the HW module?
Pros include the limited area and power, performance overhead as the new implementation will typically satisfy the system requirements, while minimizing additional cost.
Q9. What is the term task in this paper?
The term task in this paper is used as an umbrella term, which can denoteACM Computing Surveys, Vol. V, No. N, Article XX, Publication date: January XXXX.
Q10. What are the main criteria for further categorizing into classes?
These four classes are discussed in the following subsections, as shown in Figure 13 s. Main criteria for further categorization into classes include whether modifications are required in: existing functionalities, existing task implementations, the resource allocation, the interaction with neighbouring tasks, execution mode (of additional tasks), cooperation among HW modules.
Q11. What is the concept of storing checkpoints in a customized way?
Rather than saving checkpoints at fixed intervals, checkpoints can be stored in a customized way so that the amount of stored data is minimized.
Q12. What are the pros and cons of local schemes?
Compared to global schemes, local schemes reduce the amount of data to be stored during checkpointing but require typically a more complicated recovery algorithm.
Q13. What are the pros and cons of adding modules with different functionality?
Instead of adding modules with the same functionality, modules with different functionality can be added; the added modules play an active role in the recovery as in the previous category.
Q14. What is the difference between error recovery and repair?
Error recovery is further split into forward error recovery (FER), which includes redundancy, like for example triple modular redundancy, and backward error recovery (BER), which includes rolling back to a previously saved correct state of the system.
Q15. What are the types of systems that are amenable to non-deterministic events?
Beyond the earlier discussed types of systems, intra-module schemes may address applications that are amenable to numerous non-deterministic events: uncertain functions (like human input functions), interrupts, system calls, I/O operations due to communication with external devices.
Q16. What are the pros and cons of online multiprocessor checkpointing?
system-specific strategies have been developed which deal with events coming from the external environment, especially events due to communication with external devices s. Online multiprocessor checkpointing can be broadly characterized as local and global.
Q17. What is the other group of backward techniques?
The other group of backward techniques includes the techniques that retry the execution by storing the state of the system at intermediate points.