scispace - formally typeset
Search or ask a question
Author

Evelyn Mintarno

Bio: Evelyn Mintarno is an academic researcher from Stanford University. The author has contributed to research in topics: Clock rate & Self-tuning. The author has an hindex of 8, co-authored 8 publications receiving 343 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The optimized self-tuning approach satisfies performance constraints at all times, and maximizes a lifetime computational power efficiency (LCPE) metric, which is defined as the total number of clock cycles achieved over lifetime divided by the total energy consumed over lifetime.
Abstract: This paper presents an integrated framework, together with control policies, for optimizing dynamic control of self-tuning parameters of a digital system over its lifetime in the presence of circuit aging. A variety of self-tuning parameters such as supply voltage, operating clock frequency, and dynamic cooling are considered, and jointly optimized using efficient algorithms described in this paper. Our optimized self-tuning approach satisfies performance constraints at all times, and maximizes a lifetime computational power efficiency (LCPE) metric, which is defined as the total number of clock cycles achieved over lifetime divided by the total energy consumed over lifetime. We present three control policies: 1) progressive-worst-case-aging (PWCA), which assumes worst-case aging at all times; 2) progressive-on-state-aging (POSA), which estimates aging by tracking active/sleep modes, and then assumes worst-case aging in active mode and long recovery effects in sleep mode; and 3) progressive-real-time-aging-assisted (PRTA), which acquires real-time information and initiates optimized control actions. Various flavors of these control policies for systems with dynamic voltage and frequency scaling (DVFS) are also analyzed. Simulation results on benchmark circuits, using aging models validated by 45 nm measurements, demonstrate the effectiveness and practicality of our approach in significantly improving LCPE and/or lifetime compared to traditional one-time worst-case guardbanding. We also derive system design guidelines to maximize self-tuning benefits.

89 citations

Journal ArticleDOI
TL;DR: This article presents novel system-level architecture and design innovations to cope with lifetime reliability challenges from three major sources: early-life failures, radiation-induced soft errors, and circuit aging.
Abstract: The prospect of system failure has increased because of device and chip-level effects in the late CMOS era. In this article, the authors present novel system-level architecture and design innovations to cope with these lifetime reliability challenges. At nanometer-scale geometries, several hardware failure mechanisms, which were largely benign in the past, are becoming visible at the system level. Moreover, recent studies indicate that, depending on the application, hardware failures can be significant contributors to overall system failure rates.Design of robust systems ensuring required hardware reliability, although nontrivial, is achievable but at high costs. Concurrent error detection during system operation is an extremely important aspect of such systems.Hardware reliability challenges arise from three major sources: early-life failures (also called infant mortality), radiation-induced soft errors, and circuit aging. Several techniques, such as Built-in Soft-Error Resilience (BISER), can be effectively used for correcting radiation-induced transient (soft) errors. Focus on early-life failures (ELF) and circuit aging was discussed. These techniques utilize specific characteristics of reliability mechanisms without incurring the high costs of traditional concurrent error detection.

75 citations

Proceedings ArticleDOI
14 Apr 2013
TL;DR: In this article, the authors present an instance-based simulation flow, which creates a standard-cell library for each use of the cell in the design, by aging each transistor individually.
Abstract: This paper analyzes aging effects on various design hierarchies of a sub-45nm commercial processor running realistic applications. Dependencies of aging effects on switching-activity and power-state of workloads are quantified. This paper presents an “instance-based” simulation flow, which creates a standard-cell library for each use of the cell in the design, by aging each transistor individually. Implementation results show that processor timing degradation can vary from 2% to 11%, depending on workload. Lifetime computational power efficiency improvements of optimized self-tuning is demonstrated, relative to a one-time worst-case guardbanding approach.

64 citations

Proceedings ArticleDOI
09 Oct 2009
TL;DR: This work first examines critical model assumptions in the reaction-diffusion process that is responsible for the NBTI effect, and proposes a new aging model that effectively analyzes the degradation under various low-power operations and captures the essential role of the long recovery phase in circuit aging.
Abstract: Low-power circuit operations, such as dynamic voltage scaling and the sleep mode, pose a unique challenge to aging prediction. Traditional aging models assume constant voltage and averaged activity factor, ignoring the impact of the long sleep period, and thus, result in a significant overestimation of the degradation rate. To accurately predict the aging effect in low-power design, this work first examines critical model assumptions in the reaction-diffusion process that is responsible for the NBTI effect. By using the correct diffusion profile, it then proposes a new aging model that effectively analyzes the degradation under various low-power operations. The new model well predicts the aging behavior of scaled CMOS measurement data (45nm and 65nm) with different operation patterns, especially sleep mode operation and dynamic voltage scaling. Compared to previous aging models, the new result captures the essential role of the long recovery phase in circuit aging, reducing unnecessary guardbanding in reliability protection.

60 citations

Proceedings ArticleDOI
08 Mar 2010
TL;DR: This work presents a framework and control policies for optimizing dynamic control of various self-tuning parameters over lifetime in the presence of circuit aging, and introduces dynamic cooling as one of the self- Tuning parameters, in addition to supply voltage and clock frequency.
Abstract: We present a framework and control policies for optimizing dynamic control of various self-tuning parameters over lifetime in the presence of circuit aging. Our framework introduces dynamic cooling as one of the self-tuning parameters, in addition to supply voltage and clock frequency. Our optimized self-tuning satisfies performance constraints at all times and maximizes a lifetime computational power efficiency (LCPE) metric, which is defined as the total number of clock cycles achieved over lifetime divided by the total energy consumed over lifetime. Our framework features three control policies: 1. Progressive-worst-case-aging (PWCA), which assumes worst-case aging at all times; 2. Progressive-on-state-aging (POSA), which estimates aging by tracking active/sleep mode, and then assumes worst-case aging in active mode and long recovery effects in sleep mode; 3. Progressive-real-time-aging-assisted (PRTA), which estimates the actual amount of aging and initiates optimized control action. Simulation results on benchmark circuits, using aging models validated by 45nm CMOS stress measurements, demonstrate the practicality and effectiveness of our approach. We also analyze design constraints and derive system design guidelines to maximize self-tuning benefits.

38 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The article at hand reviews the failure mechanisms, fault models, diagnosis techniques, and fault-tolerance methods in on-chip networks, and surveys and summarizes the research of the last ten years.
Abstract: Networks-on-Chip constitute the interconnection architecture of future, massively parallel multiprocessors that assemble hundreds to thousands of processing cores on a single chip. Their integration is enabled by ongoing miniaturization of chip manufacturing technologies following Moore's Law. It comes with the downside of the circuit elements' increased susceptibility to failure. Research on fault-tolerant Networks-on-Chip tries to mitigate partial failure and its effect on network performance and reliability by exploiting various forms of redundancy at the suitable network layers. The article at hand reviews the failure mechanisms, fault models, diagnosis techniques, and fault-tolerance methods in on-chip networks, and surveys and summarizes the research of the last ten years. It is structured along three communication layers: the data link, the network, and the transport layers. The most important results are summarized and open research problems and challenges are highlighted to guide future research on this topic.

198 citations

Proceedings ArticleDOI
29 May 2013
TL;DR: In this article, the authors introduce the most prominent reliability concerns from today's points of view and roughly recapitulate the progress in the community so far and suggest a way for coping with reliability challenges in upcoming technology nodes.
Abstract: Reliability concerns due to technology scaling have been a major focus of researchers and designers for several technology nodes. Therefore, many new techniques for enhancing and optimizing reliability have emerged particularly within the last five to ten years. This perspective paper introduces the most prominent reliability concerns from today's points of view and roughly recapitulates the progress in the community so far. The focus of this paper is on perspective trends from the industrial as well as academic points of view that suggest a way for coping with reliability challenges in upcoming technology nodes.

197 citations

Journal ArticleDOI
TL;DR: Specific sensing mechanisms that have been developed and their potential use in building underdesigned and opportunistic computing machines, including software stack that opportunistically adapts to a sensed or modeled hardware.
Abstract: Microelectronic circuits exhibit increasing variations in performance, power consumption, and reliability parameters across the manufactured parts and across use of these parts over time in the field. These variations have led to increasing use of overdesign and guardbands in design and test to ensure yield and reliability with respect to a rigid set of datasheet specifications. This paper explores the possibility of constructing computing machines that purposely expose hardware variations to various layers of the system stack including software. This leads to the vision of underdesigned hardware that utilizes a software stack that opportunistically adapts to a sensed or modeled hardware. The envisioned underdesigned and opportunistic computing (UnO) machines face a number of challenges related to the sensing infrastructure and software interfaces that can effectively utilize the sensory data. In this paper, we outline specific sensing mechanisms that we have developed and their potential use in building UnO machines.

153 citations

Proceedings ArticleDOI
13 Jun 2010
TL;DR: An overview of the post-silicon validation problem and how it differs from traditional pre- silicon verification and manufacturing testing is provided.
Abstract: Post-silicon validation is used to detect and fix bugs in integrated circuits and systems after manufacture. Due to sheer design complexity, it is nearly impossible to detect and fix all bugs before manufacture. Post-silicon validation is a major challenge for future systems. Today, it is largely viewed as an art with very few systematic solutions. As a result, post-silicon validation is an emerging research topic with several exciting opportunities for major innovations in electronic design automation. In this paper, we provide an overview of the post-silicon validation problem and how it differs from traditional pre-silicon verification and manufacturing testing. We also discuss major post-silicon validation challenges and recent advances.

147 citations

Proceedings ArticleDOI
08 Mar 2010
TL;DR: This paper examines the issue of circuit resilience, then proposes and demonstrates a roadmap for evaluating fault rates starting at the 45nm and going down to the 12nm nodes, with the hope that it will invigorate research in this area.
Abstract: Technology scaling has an increasing impact on the resilience of CMOS circuits This outcome is the result of (a) increasing sensitivity to various intrinsic and extrinsic noise sources as circuits shrink, and (b) a corresponding increase in parametric variability causing behavior similar to what would be expected with hard (topological) faults This paper examines the issue of circuit resilience, then proposes and demonstrates a roadmap for evaluating fault rates starting at the 45nm and going down to the 12nm nodes The complete infrastructure necessary to make these predictions is placed in the open source domain, with the hope that it will invigorate research in this area

104 citations