An Architecture for Runtime Evaluation of SoC Reliability

Open Access

An Architecture for Runtime Evaluation of SoC Reliability

- pp 177-184

TLDR

This paper presents an architecture to evaluate the reliability of a systemon-chip (SoC) during its runtime that also accounts for the system’s redundancy and proposes to integrate an autonomic layer into the SoC to detect the chip's current condition and instruct appropriate countermeasures.

Abstract:

This paper presents an architecture to evaluate the reliability of a systemon-chip (SoC) during its runtime that also accounts for the system’s redundancy. We propose to integrate an autonomic layer into the SoC to detect the chip’s current condition and instruct appropriate countermeasures. In the autonomic layer, error counters are used to count the number of errors within a fixed time interval. The counters’ values accumulate into a global register representing the system’s reliability. The accumulation takes into account the series and parallel composition of the system.

Citations

PDF

Open Access

More filters

Learning Classifier Tables for Autonomic Systems on Chip.

Johannes Zeppenfeld, +3 more

TL;DR: This paper introduces a new hardware-based machine learning building block – called Learning Classifier Table (LCT) – for the run-time reliability, performance and power optimization of future generations of Systems-on-Chip.

...read moreread less

Journal ArticleDOI

Sustainable Modular Adaptive Redundancy Technique Emphasizing Partial Reconfiguration for Reduced Power Consumption

Rawad Al-Haddad, +3 more

- 25 Aug 2011 -

International Journal of Reconfigurable ...

TL;DR: SMART was evaluated using a Sobel edge-detection application and was shown to tolerate stressful sequences of injected transient and permanent faults while reducing dynamic power consumption by 30% compared to conventional triple modular redundancy techniques, with nominal impact on the fault-tolerance capabilities.

...read moreread less

Proceedings ArticleDOI

Applying autonomic principles for workload management in multi-core systems on chip

Johannes Zeppenfeld, +1 more

TL;DR: It is shown how monitors can be added to quantify the operating state of a typical processor core, whereupon a learning classifier system evaluator can determine appropriate actions to be performed in order to optimize the frequency and task distribution across the system.

...read moreread less

Book ChapterDOI

Autonomic workload management for multi-core processor systems

Johannes Zeppenfeld, +1 more

TL;DR: It is shown that Learning Classifier Tables, a simplified XCS-based reinforcement learning technique optimized for a low-overhead hardware implementation and integration, achieves nearly optimal results for dynamic workload balancing during run time for a standard networking application at task level.

...read moreread less

Applying ASoC to Multi-core Applications for Workload Management

Johannes Zeppenfeld, +6 more

TL;DR: It is shown that Learning Classifier Tables, a simplified XCS-based reinforcement learning technique optimised for a low-overhead hardware implementation and integration, achieve nearly optimal results for task-level dynamic workload balancing during run time for a standard networking application.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Designing reliable systems from unreliable components: the challenges of transistor variability and degradation

Shekhar Borkar

- 01 Nov 2005 -

IEEE Micro

TL;DR: This article discusses effects of variability in transistor performance and proposes microarchitecture, circuit, and testing research that focuses on designing with many unreliable components (transistors) to yield reliable system designs.

...read moreread less

Proceedings ArticleDOI

Razor: a low-power pipeline based on circuit-level timing speculation

Daniel J. Ernst, +10 more

TL;DR: A solution by which the circuit can be operated even below the ‘critical’ voltage, so that no margins are required and thus more energy can be saved.

...read moreread less

Journal ArticleDOI

Robust system design with built-in soft-error resilience

Subhasish Mitra, +4 more

- 01 Feb 2005 -

IEEE Computer

TL;DR: A new design paradigm reuses design-for-testability and debug resources to eliminate transient errors caused by terrestrial radiation in chip designs.

...read moreread less

Proceedings ArticleDOI

Time redundancy based soft-error tolerance to rescue nanometer technologies

Michael Nicolaidis

TL;DR: This work uses time redundancy techniques to derive low cost soft-error tolerant implementations for logic networks in response to the increased operating frequencies, geometry shrinking and power supply reduction that accompany the process of very deep submicron scaling.

...read moreread less

Proceedings ArticleDOI

AR-SMT: a microarchitectural approach to fault tolerance in microprocessors

Eric Rotenberg

TL;DR: A new time redundancy fault-tolerant approach in which a program is duplicated and the two redundant programs simultaneously run on the processor: the technique exploits several significant microarchitectural trends to provide broad coverage of transient faults and restricted coverage of permanent faults.

...read moreread less