scispace - formally typeset
Search or ask a question
Journal ArticleDOI

VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects

TL;DR: In this paper, a microarchitecture-aware model for process variation is proposed, including both random and systematic effects, and the model is specified using a small number of highly intuitive parameters.
Abstract: Within-die parameter variation poses a major challenge to high-performance microprocessor design, negatively impacting a processor's frequency and leakage power. Addressing this problem, this paper proposes a microarchitecture-aware model for process variation-including both random and systematic effects. The model is specified using a small number of highly intuitive parameters. Using the variation model, this paper also proposes a framework to model timing errors caused by parameter variation. The model yields the failure rate of microarchitectural blocks as a function of clock frequency and the amount of variation. With the combination of the variation model and the error model, we have VARIUS, a comprehensive model that is capable of producing detailed statistics of timing errors as a function of different process parameters and operating conditions. We propose possible applications of VARIUS to microarchitectural research.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Jun 2008
TL;DR: In a 20-core CMP, the combination of variation-aware application scheduling and LinOpt increases the average throughput by 12-17% and reduces the average ED2 by 30-38% - all relative to using variation- aware scheduling together with a simple extension to Intel's Foxton power management algorithm.
Abstract: Within-die process variation causes individual cores in a ChipMultiprocessor (CMP) to differ substantially in both static powerconsumed and maximum frequency supported. In this environment,ignoring variation effects whenscheduling applications or when managing power withDynamic Voltage and Frequency Scaling (DVFS) is suboptimal. This paper proposes variation-aware algorithms for applicationscheduling and power management. One such power managementalgorithm, called {\em LinOpt}, uses linear programmingto find the best voltage and frequency levels for each of thecores in the CMP --- maximizing throughput at a given power budget.In a 20-core CMP, the combination of variation-awareapplication scheduling and {\em LinOpt} increases the averagethroughput by 12--17\% and reduces the average $ED^2$ by 30--38\%--- all relative to using variation-awarescheduling together with a simple extension to Intel's Foxtonpower management algorithm.

351 citations

Proceedings ArticleDOI
08 Nov 2008
TL;DR: This paper shows how to hide the effects of aging and how to slow down aging, which will enable a multicore designed for a 7-year service life to run, on average, at a 14-15% higher frequency during its whole service life.
Abstract: Processors progressively age during their service life due to normal workload activity. Such aging results in gradually slower circuits. Anticipating this fact, designers add timing guardbands to processors, so that processors last for a number of years. As a result, aging has important design and cost implications. To address this problem, this paper shows how to hide the effects of aging and how to slow it down. Our framework is called Facelift. It hides aging through aging-driven application scheduling. It slows down aging by applying voltage changes at key times - it uses a non-linear optimization algorithm to carefully balance the impact of voltage changes on the aging rate and on the critical path delays. Moreover, Facelift can gainfully configure the chip for a short service life. Simulation results indicate that Facelift leads to more cost-effective multicores. We can take a multicore designed for a 7-year service life and, by hiding and slowing down aging, enable it to run, on average, at a 14-15% higher frequency during its whole service life. Alternatively, we can design the multicore for a 5 to 7-month service life and still use it for 7 years.

242 citations

Proceedings ArticleDOI
09 Mar 2015
TL;DR: Adaptive-Latency DRAM (AL-DRAM), a mechanism that adoptively reduces the timing parameters for DRAM modules based on the current operating condition, is proposed and shown that dynamically optimizing the DRAM timing parameters can reliably improve system performance.
Abstract: In current systems, memory accesses to a DRAM chip must obey a set of minimum latency restrictions specified in the DRAM standard. Such timing parameters exist to guarantee reliable operation. When deciding the timing parameters, DRAM manufacturers incorporate a very large margin as a provision against two worst-case scenarios. First, due to process variation, some outlier chips are much slower than others and cannot be operated as fast. Second, chips become slower at higher temperatures, and all chips need to operate reliably at the highest supported (i.e., worst-case) DRAM temperature (85° C). In this paper, we show that typical DRAM chips operating at typical temperatures (e.g., 55° C) are capable of providing a much smaller access latency, but are nevertheless forced to operate at the largest latency of the worst-case. Our goal in this paper is to exploit the extra margin that is built into the DRAM timing parameters to improve performance. Using an FPGA-based testing platform, we first characterize the extra margin for 115 DRAM modules from three major manufacturers. Our results demonstrate that it is possible to reduce four of the most critical timing parameters by a minimum/maximum of 17.3%/54.8% at 55°C without sacrificing correctness. Based on this characterization, we propose Adaptive-Latency DRAM (AL-DRAM), a mechanism that adoptively reduces the timing parameters for DRAM modules based on the current operating condition. AL-DRAM does not require any changes to the DRAM chip or its interface. We evaluate AL-DRAM on a real system that allows us to reconfigure the timing parameters at runtime. We show that AL-DRAM improves the performance of memory-intensive workloads by an average of 14% without introducing any errors. We discuss and show why AL-DRAM does not compromise reliability. We conclude that dynamically optimizing the DRAM timing parameters can reliably improve system performance.

236 citations

Proceedings ArticleDOI
19 Jun 2010
TL;DR: This paper considers whether exposing hardware fault information to software and allowing software to control fault recovery simplifies hardware design and helps technology scaling, and describes Relax, an architectural framework for software recovery of hardware faults.
Abstract: As technology scales ever further, device unreliability is creating excessive complexity for hardware to maintain the illusion of perfect operation. In this paper, we consider whether exposing hardware fault information to software and allowing software to control fault recovery simplifies hardware design and helps technology scaling. The combination of emerging applications and emerging many-core architectures makes software recovery a viable alternative to hardware-based fault recovery. Emerging applications tend to have few I/O and memory side-effects, which limits the amount of information that needs checkpointing, and they allow discarding individual sub-computations with small qualitative impact. Software recovery can harness these properties in ways that hardware recovery cannot. We describe Relax, an architectural framework for software recovery of hardware faults. Relax includes three core components: (1) an ISA extension that allows software to mark regions of code for software recovery, (2) a hardware organization that simplifies reliability considerations and provides energy efficiency with hardware recovery support removed, and (3) software support for compilers and programmers to utilize the Relax ISA. Applying Relax to counter the effects of process variation, our results show a 20% energy efficiency improvement for PARSEC applications with only minimal source code changes and simpler hardware.

211 citations

Proceedings ArticleDOI
01 Dec 2007
TL;DR: D-FGBB is introduced, which allows the continuous re-evaluation of the bias voltages to adapt to dynamic conditions and can be synergistically combined with dynamic voltage and frequency scaling (DVFS), creating an effective means to manage power.
Abstract: Parameter variation is detrimental to a processor's frequency and leakage power. One proposed technique to mitigate it is Fine-Grain Body Biasing (FGBB), where different parts of the processor chip are given a voltage bias that changes the speed and leakage proper- ties of their transistors. This technique has been proposed for static application, with the bias voltages being programmed at manufac- turing time for worst-case conditions. In this paper, we introduce Dynamic FGBB (D-FGBB), which allows the continuous re-evaluation of the bias voltages to adapt to dynamic conditions. Our results show that D-FGBB is very versa- tile and effective. Specifically, with the processor working in nor- mal mode at fixed frequency, D-FGBB reduces the leakage power of the chip by an average of 28 42% compared to static FGBB. Alternatively, with the processor working in a high-performance mode, D-FGBB increases the processor frequency by an average of 7 9% compared to static FGBB -- or 7 16% compared to no body biasing. Finally, we also show that D-FGBB can be syner- gistically combined with Dynamic Voltage and Frequency Scaling (DVFS), creating an effective means to manage power.

133 citations

References
More filters
Book
01 Jan 1965
TL;DR: This chapter discusses the concept of a Random Variable, the meaning of Probability, and the axioms of probability in terms of Markov Chains and Queueing Theory.
Abstract: Part 1 Probability and Random Variables 1 The Meaning of Probability 2 The Axioms of Probability 3 Repeated Trials 4 The Concept of a Random Variable 5 Functions of One Random Variable 6 Two Random Variables 7 Sequences of Random Variables 8 Statistics Part 2 Stochastic Processes 9 General Concepts 10 Random Walk and Other Applications 11 Spectral Representation 12 Spectral Estimation 13 Mean Square Estimation 14 Entropy 15 Markov Chains 16 Markov Processes and Queueing Theory

13,886 citations

Book
01 Jan 2002
TL;DR: In this paper, the meaning of probability and random variables are discussed, as well as the axioms of probability, and the concept of a random variable and repeated trials are discussed.
Abstract: Part 1 Probability and Random Variables 1 The Meaning of Probability 2 The Axioms of Probability 3 Repeated Trials 4 The Concept of a Random Variable 5 Functions of One Random Variable 6 Two Random Variables 7 Sequences of Random Variables 8 Statistics Part 2 Stochastic Processes 9 General Concepts 10 Random Walk and Other Applications 11 Spectral Representation 12 Spectral Estimation 13 Mean Square Estimation 14 Entropy 15 Markov Chains 16 Markov Processes and Queueing Theory

12,407 citations

Book
01 Jan 1991
TL;DR: In this paper, the authors present a survey of statistics for spatial data in the field of geostatistics, including spatial point patterns and point patterns modeling objects, using Lattice Data and spatial models on lattices.
Abstract: Statistics for Spatial Data GEOSTATISTICAL DATA Geostatistics Spatial Prediction and Kriging Applications of Geostatistics Special Topics in Statistics for Spatial Data LATTICE DATA Spatial Models on Lattices Inference for Lattice Models SPATIAL PATTERNS Spatial Point Patterns Modeling Objects References Author Index Subject Index.

8,631 citations

Journal ArticleDOI
TL;DR: Cressie et al. as discussed by the authors presented the Statistics for Spatial Data (SDS) for the first time in 1991, and used it for the purpose of statistical analysis of spatial data.
Abstract: 5. Statistics for Spatial Data. By N. Cressie. ISBN 0 471 84336 9. Wiley, Chichester, 1991. 900 pp. £71.00.

5,555 citations

Book
01 Jan 1979
TL;DR: An electromagnetic pulse counter having successively operable, contact-operating armatures that are movable to a rest position, an intermediate position and an active position between the main pole and the secondary pole of a magnetic circuit.
Abstract: An electromagnetic pulse counter having successively operable, contact-operating armatures. The armatures are movable to a rest position, an intermediate position and an active position between the main pole and the secondary pole of a magnetic circuit.

4,897 citations