Open Access
An Architecture for Runtime Evaluation of SoC Reliability
Andreas Bernauer,Oliver Bringmann,Wolfgang Rosenstiel,Abdelmajid Bouajila,Walter Stechele,Andreas Herkersdorf +5 more
- pp 177-184
TLDR
This paper presents an architecture to evaluate the reliability of a systemon-chip (SoC) during its runtime that also accounts for the system’s redundancy and proposes to integrate an autonomic layer into the SoC to detect the chip's current condition and instruct appropriate countermeasures.Abstract:
This paper presents an architecture to evaluate the reliability of a systemon-chip (SoC) during its runtime that also accounts for the system’s redundancy. We propose to integrate an autonomic layer into the SoC to detect the chip’s current condition and instruct appropriate countermeasures. In the autonomic layer, error counters are used to count the number of errors within a fixed time interval. The counters’ values accumulate into a global register representing the system’s reliability. The accumulation takes into account the series and parallel composition of the system.read more
Citations
More filters
Learning Classifier Tables for Autonomic Systems on Chip.
TL;DR: This paper introduces a new hardware-based machine learning building block – called Learning Classifier Table (LCT) – for the run-time reliability, performance and power optimization of future generations of Systems-on-Chip.
Journal ArticleDOI
Sustainable Modular Adaptive Redundancy Technique Emphasizing Partial Reconfiguration for Reduced Power Consumption
TL;DR: SMART was evaluated using a Sobel edge-detection application and was shown to tolerate stressful sequences of injected transient and permanent faults while reducing dynamic power consumption by 30% compared to conventional triple modular redundancy techniques, with nominal impact on the fault-tolerance capabilities.
Proceedings ArticleDOI
Applying autonomic principles for workload management in multi-core systems on chip
TL;DR: It is shown how monitors can be added to quantify the operating state of a typical processor core, whereupon a learning classifier system evaluator can determine appropriate actions to be performed in order to optimize the frequency and task distribution across the system.
Book ChapterDOI
Autonomic workload management for multi-core processor systems
TL;DR: It is shown that Learning Classifier Tables, a simplified XCS-based reinforcement learning technique optimized for a low-overhead hardware implementation and integration, achieves nearly optimal results for dynamic workload balancing during run time for a standard networking application at task level.
Applying ASoC to Multi-core Applications for Workload Management
Johannes Zeppenfeld,Abdelmajid Bouajila,Walter Stechele,Andreas Herkersdorf,Andreas Bernauer,Oliver Bringmann,Wolfgang Rosenstiel +6 more
TL;DR: It is shown that Learning Classifier Tables, a simplified XCS-based reinforcement learning technique optimised for a low-overhead hardware implementation and integration, achieve nearly optimal results for task-level dynamic workload balancing during run time for a standard networking application.
References
More filters
Journal ArticleDOI
Designing reliable systems from unreliable components: the challenges of transistor variability and degradation
TL;DR: This article discusses effects of variability in transistor performance and proposes microarchitecture, circuit, and testing research that focuses on designing with many unreliable components (transistors) to yield reliable system designs.
Proceedings ArticleDOI
Razor: a low-power pipeline based on circuit-level timing speculation
Daniel J. Ernst,Nam Sung Kim,Shidhartha Das,Sanjay Pant,Rajeev R. Rao,Toan Pham,Conrad H. Ziesler,David Blaauw,Todd Austin,Krisztian Flautner,Trevor Mudge +10 more
TL;DR: A solution by which the circuit can be operated even below the ‘critical’ voltage, so that no margins are required and thus more energy can be saved.
Journal ArticleDOI
Robust system design with built-in soft-error resilience
TL;DR: A new design paradigm reuses design-for-testability and debug resources to eliminate transient errors caused by terrestrial radiation in chip designs.
Proceedings ArticleDOI
Time redundancy based soft-error tolerance to rescue nanometer technologies
TL;DR: This work uses time redundancy techniques to derive low cost soft-error tolerant implementations for logic networks in response to the increased operating frequencies, geometry shrinking and power supply reduction that accompany the process of very deep submicron scaling.
Proceedings ArticleDOI
AR-SMT: a microarchitectural approach to fault tolerance in microprocessors
TL;DR: A new time redundancy fault-tolerant approach in which a program is duplicated and the two redundant programs simultaneously run on the processor: the technique exploits several significant microarchitectural trends to provide broad coverage of transient faults and restricted coverage of permanent faults.