scispace - formally typeset
Open Access

An Architecture for Runtime Evaluation of SoC Reliability

TLDR
This paper presents an architecture to evaluate the reliability of a systemon-chip (SoC) during its runtime that also accounts for the system’s redundancy and proposes to integrate an autonomic layer into the SoC to detect the chip's current condition and instruct appropriate countermeasures.
Abstract
This paper presents an architecture to evaluate the reliability of a systemon-chip (SoC) during its runtime that also accounts for the system’s redundancy. We propose to integrate an autonomic layer into the SoC to detect the chip’s current condition and instruct appropriate countermeasures. In the autonomic layer, error counters are used to count the number of errors within a fixed time interval. The counters’ values accumulate into a global register representing the system’s reliability. The accumulation takes into account the series and parallel composition of the system.

read more

Content maybe subject to copyright    Report

Citations
More filters

Learning Classifier Tables for Autonomic Systems on Chip.

TL;DR: This paper introduces a new hardware-based machine learning building block – called Learning Classifier Table (LCT) – for the run-time reliability, performance and power optimization of future generations of Systems-on-Chip.
Journal ArticleDOI

Sustainable Modular Adaptive Redundancy Technique Emphasizing Partial Reconfiguration for Reduced Power Consumption

TL;DR: SMART was evaluated using a Sobel edge-detection application and was shown to tolerate stressful sequences of injected transient and permanent faults while reducing dynamic power consumption by 30% compared to conventional triple modular redundancy techniques, with nominal impact on the fault-tolerance capabilities.
Proceedings ArticleDOI

Applying autonomic principles for workload management in multi-core systems on chip

TL;DR: It is shown how monitors can be added to quantify the operating state of a typical processor core, whereupon a learning classifier system evaluator can determine appropriate actions to be performed in order to optimize the frequency and task distribution across the system.
Book ChapterDOI

Autonomic workload management for multi-core processor systems

TL;DR: It is shown that Learning Classifier Tables, a simplified XCS-based reinforcement learning technique optimized for a low-overhead hardware implementation and integration, achieves nearly optimal results for dynamic workload balancing during run time for a standard networking application at task level.

Applying ASoC to Multi-core Applications for Workload Management

TL;DR: It is shown that Learning Classifier Tables, a simplified XCS-based reinforcement learning technique optimised for a low-overhead hardware implementation and integration, achieve nearly optimal results for task-level dynamic workload balancing during run time for a standard networking application.
References
More filters
Journal ArticleDOI

Designing reliable systems from unreliable components: the challenges of transistor variability and degradation

Shekhar Borkar
- 01 Nov 2005 - 
TL;DR: This article discusses effects of variability in transistor performance and proposes microarchitecture, circuit, and testing research that focuses on designing with many unreliable components (transistors) to yield reliable system designs.
Proceedings ArticleDOI

Razor: a low-power pipeline based on circuit-level timing speculation

TL;DR: A solution by which the circuit can be operated even below the ‘critical’ voltage, so that no margins are required and thus more energy can be saved.
Journal ArticleDOI

Robust system design with built-in soft-error resilience

TL;DR: A new design paradigm reuses design-for-testability and debug resources to eliminate transient errors caused by terrestrial radiation in chip designs.
Proceedings ArticleDOI

Time redundancy based soft-error tolerance to rescue nanometer technologies

TL;DR: This work uses time redundancy techniques to derive low cost soft-error tolerant implementations for logic networks in response to the increased operating frequencies, geometry shrinking and power supply reduction that accompany the process of very deep submicron scaling.
Proceedings ArticleDOI

AR-SMT: a microarchitectural approach to fault tolerance in microprocessors

TL;DR: A new time redundancy fault-tolerant approach in which a program is duplicated and the two redundant programs simultaneously run on the processor: the technique exploits several significant microarchitectural trends to provide broad coverage of transient faults and restricted coverage of permanent faults.
Related Papers (5)