scispace - formally typeset
Search or ask a question
Journal ArticleDOI

SeRA: Self-Repairing Architecture for Dark Silicon Era

31 Mar 2020-Journal of Circuits, Systems, and Computers (World Scientific Publishing Company)-Vol. 29, Iss: 04, pp 2050053
TL;DR: The lifetime reliability of processors has become a major design constraint in the dark silicon era and design defects and aging are a major concern.
Abstract: The lifetime reliability of processors has become a major design constraint in the dark silicon era. Processor reliability issues are mainly due to design defects and aging. Unlike design defects, ...
References
More filters
Journal ArticleDOI
Robert Baumann1
TL;DR: In this article, the authors review the types of failure modes for soft errors, the three dominant radiation mechanisms responsible for creating soft errors in terrestrial applications, and how these soft errors are generated by the collection of radiation-induced charge.
Abstract: The once-ephemeral radiation-induced soft error has become a key threat to advanced commercial electronic components and systems. Left unchallenged, soft errors have the potential for inducing the highest failure rate of all other reliability mechanisms combined. This article briefly reviews the types of failure modes for soft errors, the three dominant radiation mechanisms responsible for creating soft errors in terrestrial applications, and how these soft errors are generated by the collection of radiation-induced charge. The soft error sensitivity as a function of technology scaling for various memory and logic components is then presented with a consideration of which applications are most likely to require soft error mitigation.

1,345 citations

Journal ArticleDOI
TL;DR: In this article, the power, thermal and reliability modeling problems are explained and recent advances in their accurate and efficient analysis are surveyed.
Abstract: System integration and performance requirements are dramatically increasing the power consumptions and power densities of high-performance microprocessors. High power consumption introduces challenges to various aspects of microprocessor and computer system design. It increases the cost of cooling and packaging design, reduces system reliability, complicates power supply circuitry design, and reduces battery life. Researchers have recently dedicated intensive effort to power-related design problems. Modeling is the essential first step toward design optimization. In this article, the power, thermal and reliability modeling problems are explained and recent advances in their accurate and efficient analysis are surveyed.

137 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This paper presents new trends in dark silicon reflecting, among others, the deployment of FinFETs in recent technology nodes and the impact of voltage/frquency scaling, which lead to new less-conservative predictions.
Abstract: This paper presents new trends in dark silicon reflecting, among others, the deployment of FinFETs in recent technology nodes and the impact of voltage/frquency scaling, which lead to new less-conservative predictions. The focus is on dark silicon from a thermal perspective: we show that it is not simply the chip's total power budget, e.g., the Thermal Design Power (TDP), that leads to the dark silicon problem, but instead it is the power density and related thermal effects. We therefore propose to use Thermal Safe Power (TSP) as a more efficient power budget. It is also shown that sophisticated spatio-temporal mapping decisions result in improved thermal profiles with reduced peak temperatures. Moreover, we discuss the implications of Near-Threshold Computing (NTC) and employment of Boosting techniques in dark silicon systems.

108 citations

Journal ArticleDOI
TL;DR: This paper provides an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms and demonstrates the usefulness of the taxonomy by classifying state-of-the-art task-based environments in use today.
Abstract: Task-based programming models for shared memory--such as Cilk Plus and OpenMP 3--are well established and documented. However, with the increase in parallel, many-core, and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

107 citations

01 Jan 2015
TL;DR: In this article, the authors describe the detection and isolation (diagnosis) of faults (major equipment and sensor/actuator malfunctions) in engineering systems, which do not rely on any mathematical model of the system.
Abstract: The article describes the detection and isolation (diagnosis) of faults (major equipment and sensor/actuator malfunctions) in engineering systems. The simpler, and less powerful methods do not rely on any mathematical model of the system; these include limit checking, special and multiple sensors, frequency analysis, and fault-tree analysis. More advanced methods use mathematical models obtained from first principles or from experimental data. Such methods include on-line parameter estimation, consistency checking, and principal component analysis. Two examples, one related to a simple electrical circuit and the other to a car-engine subsystem, demonstrate the use of some of the methods. Keywords: fault detection; fault diagnosis; limit checking; frequency analysis; fault trees; parameter estimation; consistency relations; principal component analysis

87 citations