VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects

doi:10.1109/TSM.2007.913186

Home
/
Papers
/
VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects

Journal Article•DOI•

VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects

Smruti R. Sarangi, Brian Greskamp¹, Radu Teodorescu¹, Jun Nakano¹, Abhishek Tiwari¹, Josep Torrellas¹ - Show less +2 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

07 Feb 2008-IEEE Transactions on Semiconductor Manufacturing (IEEE)-Vol. 21, Iss: 1, pp 3-13

TL;DR: In this paper, a microarchitecture-aware model for process variation is proposed, including both random and systematic effects, and the model is specified using a small number of highly intuitive parameters.

read less

Abstract: Within-die parameter variation poses a major challenge to high-performance microprocessor design, negatively impacting a processor's frequency and leakage power. Addressing this problem, this paper proposes a microarchitecture-aware model for process variation-including both random and systematic effects. The model is specified using a small number of highly intuitive parameters. Using the variation model, this paper also proposes a framework to model timing errors caused by parameter variation. The model yields the failure rate of microarchitectural blocks as a function of clock frequency and the amount of variation. With the combination of the variation model and the error model, we have VARIUS, a comprehensive model that is capable of producing detailed statistics of timing errors as a function of different process parameters and operating conditions. We propose possible applications of VARIUS to microarchitectural research.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors

[...]

Radu Teodorescu¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jun 2008

TL;DR: In a 20-core CMP, the combination of variation-aware application scheduling and LinOpt increases the average throughput by 12-17% and reduces the average ED2 by 30-38% - all relative to using variation- aware scheduling together with a simple extension to Intel's Foxton power management algorithm.

...read moreread less

Abstract: Within-die process variation causes individual cores in a ChipMultiprocessor (CMP) to differ substantially in both static powerconsumed and maximum frequency supported. In this environment,ignoring variation effects whenscheduling applications or when managing power withDynamic Voltage and Frequency Scaling (DVFS) is suboptimal. This paper proposes variation-aware algorithms for applicationscheduling and power management. One such power managementalgorithm, called {\em LinOpt}, uses linear programmingto find the best voltage and frequency levels for each of thecores in the CMP --- maximizing throughput at a given power budget.In a 20-core CMP, the combination of variation-awareapplication scheduling and {\em LinOpt} increases the averagethroughput by 12--17\% and reduces the average $ED^2$ by 30--38\%--- all relative to using variation-awarescheduling together with a simple extension to Intel's Foxtonpower management algorithm.

...read moreread less

351 citations

Proceedings Article•DOI•

Facelift: Hiding and slowing down aging in multicores

[...]

Abhishek Tiwari¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

08 Nov 2008

TL;DR: This paper shows how to hide the effects of aging and how to slow down aging, which will enable a multicore designed for a 7-year service life to run, on average, at a 14-15% higher frequency during its whole service life.

...read moreread less

Abstract: Processors progressively age during their service life due to normal workload activity. Such aging results in gradually slower circuits. Anticipating this fact, designers add timing guardbands to processors, so that processors last for a number of years. As a result, aging has important design and cost implications. To address this problem, this paper shows how to hide the effects of aging and how to slow it down. Our framework is called Facelift. It hides aging through aging-driven application scheduling. It slows down aging by applying voltage changes at key times - it uses a non-linear optimization algorithm to carefully balance the impact of voltage changes on the aging rate and on the critical path delays. Moreover, Facelift can gainfully configure the chip for a short service life. Simulation results indicate that Facelift leads to more cost-effective multicores. We can take a multicore designed for a 7-year service life and, by hiding and slowing down aging, enable it to run, on average, at a 14-15% higher frequency during its whole service life. Alternatively, we can design the multicore for a 5 to 7-month service life and still use it for 7 years.

...read moreread less

242 citations

Proceedings Article•DOI•

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case

[...]

Donghyuk Lee¹, Yoongu Kim¹, Gennady Pekhimenko¹, Samira Khan¹, Vivek Seshadri¹, Kevin K. Chang¹, Onur Mutlu¹ - Show less +3 more•Institutions (1)

Carnegie Mellon University¹

09 Mar 2015

TL;DR: Adaptive-Latency DRAM (AL-DRAM), a mechanism that adoptively reduces the timing parameters for DRAM modules based on the current operating condition, is proposed and shown that dynamically optimizing the DRAM timing parameters can reliably improve system performance.

...read moreread less

Abstract: In current systems, memory accesses to a DRAM chip must obey a set of minimum latency restrictions specified in the DRAM standard. Such timing parameters exist to guarantee reliable operation. When deciding the timing parameters, DRAM manufacturers incorporate a very large margin as a provision against two worst-case scenarios. First, due to process variation, some outlier chips are much slower than others and cannot be operated as fast. Second, chips become slower at higher temperatures, and all chips need to operate reliably at the highest supported (i.e., worst-case) DRAM temperature (85° C). In this paper, we show that typical DRAM chips operating at typical temperatures (e.g., 55° C) are capable of providing a much smaller access latency, but are nevertheless forced to operate at the largest latency of the worst-case. Our goal in this paper is to exploit the extra margin that is built into the DRAM timing parameters to improve performance. Using an FPGA-based testing platform, we first characterize the extra margin for 115 DRAM modules from three major manufacturers. Our results demonstrate that it is possible to reduce four of the most critical timing parameters by a minimum/maximum of 17.3%/54.8% at 55°C without sacrificing correctness. Based on this characterization, we propose Adaptive-Latency DRAM (AL-DRAM), a mechanism that adoptively reduces the timing parameters for DRAM modules based on the current operating condition. AL-DRAM does not require any changes to the DRAM chip or its interface. We evaluate AL-DRAM on a real system that allows us to reconfigure the timing parameters at runtime. We show that AL-DRAM improves the performance of memory-intensive workloads by an average of 14% without introducing any errors. We discuss and show why AL-DRAM does not compromise reliability. We conclude that dynamically optimizing the DRAM timing parameters can reliably improve system performance.

...read moreread less

236 citations

Proceedings Article•DOI•

Relax: an architectural framework for software recovery of hardware faults

[...]

Marc de Kruijf¹, Shuou Nomura¹, Karthikeyan Sankaralingam¹•Institutions (1)

University of Wisconsin-Madison¹

19 Jun 2010

TL;DR: This paper considers whether exposing hardware fault information to software and allowing software to control fault recovery simplifies hardware design and helps technology scaling, and describes Relax, an architectural framework for software recovery of hardware faults.

...read moreread less

Abstract: As technology scales ever further, device unreliability is creating excessive complexity for hardware to maintain the illusion of perfect operation. In this paper, we consider whether exposing hardware fault information to software and allowing software to control fault recovery simplifies hardware design and helps technology scaling. The combination of emerging applications and emerging many-core architectures makes software recovery a viable alternative to hardware-based fault recovery. Emerging applications tend to have few I/O and memory side-effects, which limits the amount of information that needs checkpointing, and they allow discarding individual sub-computations with small qualitative impact. Software recovery can harness these properties in ways that hardware recovery cannot. We describe Relax, an architectural framework for software recovery of hardware faults. Relax includes three core components: (1) an ISA extension that allows software to mark regions of code for software recovery, (2) a hardware organization that simplifies reliability considerations and provides energy efficiency with hardware recovery support removed, and (3) software support for compilers and programmers to utilize the Relax ISA. Applying Relax to counter the effects of process variation, our results show a 20% energy efficiency improvement for PARSEC applications with only minimal source code changes and simpler hardware.

...read moreread less

211 citations

Proceedings Article•DOI•

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing

[...]

Radu Teodorescu¹, Jun Nakano¹, Abhishek Tiwari¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Dec 2007

TL;DR: D-FGBB is introduced, which allows the continuous re-evaluation of the bias voltages to adapt to dynamic conditions and can be synergistically combined with dynamic voltage and frequency scaling (DVFS), creating an effective means to manage power.

...read moreread less

Abstract: Parameter variation is detrimental to a processor's frequency and leakage power. One proposed technique to mitigate it is Fine-Grain Body Biasing (FGBB), where different parts of the processor chip are given a voltage bias that changes the speed and leakage proper- ties of their transistors. This technique has been proposed for static application, with the bias voltages being programmed at manufac- turing time for worst-case conditions. In this paper, we introduce Dynamic FGBB (D-FGBB), which allows the continuous re-evaluation of the bias voltages to adapt to dynamic conditions. Our results show that D-FGBB is very versa- tile and effective. Specifically, with the processor working in nor- mal mode at fixed frequency, D-FGBB reduces the leakage power of the chip by an average of 28 42% compared to static FGBB. Alternatively, with the processor working in a high-performance mode, D-FGBB increases the processor frequency by an average of 7 9% compared to static FGBB -- or 7 16% compared to no body biasing. Finally, we also show that D-FGBB can be syner- gistically combined with Dynamic Voltage and Frequency Scaling (DVFS), creating an effective means to manage power.

...read moreread less

133 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Collapse

References

PDF

Open Access

More filters

Book•

Probability, random variables and stochastic processes

[...]

Athanasios Papoulis

01 Jan 1965

TL;DR: This chapter discusses the concept of a Random Variable, the meaning of Probability, and the axioms of probability in terms of Markov Chains and Queueing Theory.

...read moreread less

Abstract: Part 1 Probability and Random Variables 1 The Meaning of Probability 2 The Axioms of Probability 3 Repeated Trials 4 The Concept of a Random Variable 5 Functions of One Random Variable 6 Two Random Variables 7 Sequences of Random Variables 8 Statistics Part 2 Stochastic Processes 9 General Concepts 10 Random Walk and Other Applications 11 Spectral Representation 12 Spectral Estimation 13 Mean Square Estimation 14 Entropy 15 Markov Chains 16 Markov Processes and Queueing Theory

...read moreread less

13,886 citations

Book•

Probability, random variables, and stochastic processes

[...]

Athanasios Papoulis, S. Unnikrishna Pillai

01 Jan 2002

TL;DR: In this paper, the meaning of probability and random variables are discussed, as well as the axioms of probability, and the concept of a random variable and repeated trials are discussed.

...read moreread less

12,407 citations

Book•

Statistics for spatial data

[...]

Noel A Cressie¹, Noel A Cressie²•Institutions (2)

John Wiley & Sons¹, Uppsala University²

01 Jan 1991

TL;DR: In this paper, the authors present a survey of statistics for spatial data in the field of geostatistics, including spatial point patterns and point patterns modeling objects, using Lattice Data and spatial models on lattices.

...read moreread less

Abstract: Statistics for Spatial Data GEOSTATISTICAL DATA Geostatistics Spatial Prediction and Kriging Applications of Geostatistics Special Topics in Statistics for Spatial Data LATTICE DATA Spatial Models on Lattices Inference for Lattice Models SPATIAL PATTERNS Spatial Point Patterns Modeling Objects References Author Index Subject Index.

...read moreread less

8,631 citations

Journal Article•DOI•

5. Statistics for Spatial Data

[...]

Mike Rees¹, N. Cressie•Institutions (1)

University of Westminster¹

01 Jan 1993-Journal of The Royal Statistical Society Series A-statistics in Society

TL;DR: Cressie et al. as discussed by the authors presented the Statistics for Spatial Data (SDS) for the first time in 1991, and used it for the purpose of statistical analysis of spatial data.

...read moreread less

Abstract: 5. Statistics for Spatial Data. By N. Cressie. ISBN 0 471 84336 9. Wiley, Chichester, 1991. 900 pp. £71.00.

...read moreread less

5,555 citations

Book•

Random variables and stochastic processes

[...]

Harold J. Larson, Bruno O. Shubert

01 Jan 1979

TL;DR: An electromagnetic pulse counter having successively operable, contact-operating armatures that are movable to a rest position, an intermediate position and an active position between the main pole and the secondary pole of a magnetic circuit.

...read moreread less

Abstract: An electromagnetic pulse counter having successively operable, contact-operating armatures. The armatures are movable to a rest position, an intermediate position and an active position between the main pole and the secondary pole of a magnetic circuit.

...read moreread less

4,897 citations