scispace - formally typeset
Journal ArticleDOI

Rescue: A Microarchitecture for Testability and Defect Tolerance

Ethan Schuchman, +1 more
- Vol. 33, Iss: 2, pp 160-171
Reads0
Chats0
TLDR
This paper is the first to consider testability and fault isolation in designing modern high-performance, defect-tolerant microarchitectures, and defines intra-cycle logic independence (ICI) as the condition needed for conventional scan test to isolate faults quickly to themicroarchitectural-block granularity.
Abstract
Scaling feature size improves processor performance but increases each deviceýs susceptibility to defects (i.e., hard errors). As a result, fabrication technology must improve significantly to maintain yields. Redundancy techniques in memory have been successful at improving yield in the presence of defects. Apart from core sparing which disables faulty cores in a chip multiprocessor, little has been done to target the core logic. While previous work has proposed that either inherent or added redundancy in the core logic can be used to tolerate defects, the key issues of realistic testing and fault isolation have been ignored. This paper is the first to consider testability and fault isolation in designing modern high-performance, defect-tolerant microarchitectures. We define intra-cycle logic independence (ICI) as the condition needed for conventional scan test to isolate faults quickly to the microarchitectural-block granularity. We propose logic transformations to redesign conventional superscalar microarchitecture to comply with ICI. We call our novel, testable, and defect-tolerant microarchitecture Rescue. We build a verilog model of Rescue and verify that faults can be isolated to the required precision using only conventional scan test. Using performace simulations, we show that ICI transformations reduce IPC only by 4% on average for SPEC2000 programs. Taking yield improvement into account, Rescue improves average yield-adjusted instruction throughput over core sparing by 12% and 22% at 32nm and 18nm technology nodes, respectively.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Scalable thread scheduling and global power management for heterogeneous many-core architectures

TL;DR: This paper presents a range of scheduling and power management algorithms and performs a detailed evaluation of their effectiveness and scalability on heterogeneous many-core architectures with up to 256 cores and proposes a Hierarchical Hungarian Scheduling Algorithm that dramatically reduces the scheduling overhead without loss of accuracy.
Proceedings ArticleDOI

Architectural core salvaging in a multi-core processor for hard-error tolerance

TL;DR: It is shown that even if some individual cores cannot execute certain operations, a CPU die can be instruction-set-architecture (ISA) compliant, that is execute all of the instructions required by its ISA, by exploiting natural cross-core redundancy.
Proceedings ArticleDOI

Architectures for online error detection and recovery in multicore processors

TL;DR: This paper focuses on dependable multicore processor architectures that integrate solutions for online error detection, diagnosis, recovery, and repair during field operation and discusses taxonomy of representative approaches and presents a qualitative comparison based on hardware cost, performance overhead, types of faults detected, and detection latency.
Proceedings ArticleDOI

A Mechanism for Online Diagnosis of Hard Faults in Microprocessors

TL;DR: A reliable microprocessor design is developed that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy by leveraging DIVA dynamic verification and a new scheme for diagnosing hard faults.
Proceedings ArticleDOI

Core cannibalization architecture: improving lifetime chip performance for multicore processors in the presence of hard faults

TL;DR: This work has designed and laid out CCA chips composed of multiple OpenRISC 1200 cores and shows that CCA improves the chips' lifetime performances, compared to chips without CCA.
References
More filters
Proceedings ArticleDOI

Automatically characterizing large scale program behavior

TL;DR: This work quantifies the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explores the large scale behavior of several programs, and develops a set of algorithms based on clustering capable of analyzing this behavior.
Book

Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits

TL;DR: This book provides a careful selection of essential topics on all three types of circuits, namely, digital, memory, and mixed-signal, each requiring different test and design for testability methods.
Proceedings ArticleDOI

Temperature-aware microarchitecture

TL;DR: HotSpot is described, an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package that shows that power metrics are poor predictors of temperature, and that sensor imprecision has a substantial impact on the performance of DTM.
Journal ArticleDOI

A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology

TL;DR: The defect-tolerant architecture of Teramac, which incorporates a high communication bandwith that enables it to easily route around defects, has significant implications for any future nanometer-scale computational paradigm.
Proceedings ArticleDOI

Complexity-effective superscalar processors

TL;DR: A microarchitecture that simplifies wakeup and selection logic is proposed and discussed, which will help minimize performance degradation due to slow bypasses in future wide-issue machines.