Journal ArticleDOI
Rescue: A Microarchitecture for Testability and Defect Tolerance
Ethan Schuchman,T. N. Vijaykumar +1 more
- Vol. 33, Iss: 2, pp 160-171
Reads0
Chats0
TLDR
This paper is the first to consider testability and fault isolation in designing modern high-performance, defect-tolerant microarchitectures, and defines intra-cycle logic independence (ICI) as the condition needed for conventional scan test to isolate faults quickly to themicroarchitectural-block granularity.Abstract:
Scaling feature size improves processor performance but increases each deviceýs susceptibility to defects (i.e., hard errors). As a result, fabrication technology must improve significantly to maintain yields. Redundancy techniques in memory have been successful at improving yield in the presence of defects. Apart from core sparing which disables faulty cores in a chip multiprocessor, little has been done to target the core logic. While previous work has proposed that either inherent or added redundancy in the core logic can be used to tolerate defects, the key issues of realistic testing and fault isolation have been ignored. This paper is the first to consider testability and fault isolation in designing modern high-performance, defect-tolerant microarchitectures. We define intra-cycle logic independence (ICI) as the condition needed for conventional scan test to isolate faults quickly to the microarchitectural-block granularity. We propose logic transformations to redesign conventional superscalar microarchitecture to comply with ICI. We call our novel, testable, and defect-tolerant microarchitecture Rescue. We build a verilog model of Rescue and verify that faults can be isolated to the required precision using only conventional scan test. Using performace simulations, we show that ICI transformations reduce IPC only by 4% on average for SPEC2000 programs. Taking yield improvement into account, Rescue improves average yield-adjusted instruction throughput over core sparing by 12% and 22% at 32nm and 18nm technology nodes, respectively.read more
Citations
More filters
Proceedings ArticleDOI
Scalable thread scheduling and global power management for heterogeneous many-core architectures
TL;DR: This paper presents a range of scheduling and power management algorithms and performs a detailed evaluation of their effectiveness and scalability on heterogeneous many-core architectures with up to 256 cores and proposes a Hierarchical Hungarian Scheduling Algorithm that dramatically reduces the scheduling overhead without loss of accuracy.
Proceedings ArticleDOI
Architectural core salvaging in a multi-core processor for hard-error tolerance
TL;DR: It is shown that even if some individual cores cannot execute certain operations, a CPU die can be instruction-set-architecture (ISA) compliant, that is execute all of the instructions required by its ISA, by exploiting natural cross-core redundancy.
Proceedings ArticleDOI
Architectures for online error detection and recovery in multicore processors
Dimitris Gizopoulos,Mihalis Psarakis,Sarita V. Adve,Pradeep Ramachandran,Siva Kumar Sastry Hari,Daniel J. Sorin,Albert Meixner,Arijit Biswas,Xavier Vera +8 more
TL;DR: This paper focuses on dependable multicore processor architectures that integrate solutions for online error detection, diagnosis, recovery, and repair during field operation and discusses taxonomy of representative approaches and presents a qualitative comparison based on hardware cost, performance overhead, types of faults detected, and detection latency.
Proceedings ArticleDOI
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors
TL;DR: A reliable microprocessor design is developed that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy by leveraging DIVA dynamic verification and a new scheme for diagnosing hard faults.
Proceedings ArticleDOI
Core cannibalization architecture: improving lifetime chip performance for multicore processors in the presence of hard faults
TL;DR: This work has designed and laid out CCA chips composed of multiple OpenRISC 1200 cores and shows that CCA improves the chips' lifetime performances, compared to chips without CCA.
References
More filters
Proceedings ArticleDOI
Automatically characterizing large scale program behavior
TL;DR: This work quantifies the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explores the large scale behavior of several programs, and develops a set of algorithms based on clustering capable of analyzing this behavior.
Book
Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits
TL;DR: This book provides a careful selection of essential topics on all three types of circuits, namely, digital, memory, and mixed-signal, each requiring different test and design for testability methods.
Proceedings ArticleDOI
Temperature-aware microarchitecture
Kevin Skadron,Mircea R. Stan,Wei Huang,Sivakumar Velusamy,Karthik Sankaranarayanan,David Tarjan +5 more
TL;DR: HotSpot is described, an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package that shows that power metrics are poor predictors of temperature, and that sensor imprecision has a substantial impact on the performance of DTM.
Journal ArticleDOI
A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology
TL;DR: The defect-tolerant architecture of Teramac, which incorporates a high communication bandwith that enables it to easily route around defects, has significant implications for any future nanometer-scale computational paradigm.
Proceedings ArticleDOI
Complexity-effective superscalar processors
TL;DR: A microarchitecture that simplifies wakeup and selection logic is proposed and discussed, which will help minimize performance degradation due to slow bypasses in future wide-issue machines.