Home
/
Authors
/
Brian Greskamp

Author

Brian Greskamp

University of Illinois at Urbana–Champaign

Bio: Brian Greskamp is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Microarchitecture & Iterative design. The author has an hindex of 8, co-authored 12 publications receiving 830 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects

[...]

Smruti R. Sarangi, Brian Greskamp¹, Radu Teodorescu¹, Jun Nakano¹, Abhishek Tiwari¹, Josep Torrellas¹ - Show less +2 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

07 Feb 2008-IEEE Transactions on Semiconductor Manufacturing

TL;DR: In this paper, a microarchitecture-aware model for process variation is proposed, including both random and systematic effects, and the model is specified using a small number of highly intuitive parameters.

...read moreread less

Abstract: Within-die parameter variation poses a major challenge to high-performance microprocessor design, negatively impacting a processor's frequency and leakage power. Addressing this problem, this paper proposes a microarchitecture-aware model for process variation-including both random and systematic effects. The model is specified using a small number of highly intuitive parameters. Using the variation model, this paper also proposes a framework to model timing errors caused by parameter variation. The model yields the failure rate of microarchitectural blocks as a function of clock frequency and the amount of variation. With the combination of the variation model and the error model, we have VARIUS, a comprehensive model that is capable of producing detailed statistics of timing errors as a function of different process parameters and operating conditions. We propose possible applications of VARIUS to microarchitectural research.

...read moreread less

386 citations

Proceedings Article•DOI•

EVAL: Utilizing processors with variation-induced timing errors

[...]

Smruti R. Sarangi¹, Brian Greskamp¹, Abhishek Tiwari¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

08 Nov 2008

TL;DR: An effective technique to maximize performance and minimize power in the presence of variation-induced errors, namely High-Dimensional dynamic adaptation is introduced, which increases processor frequency by 56% on average, allowing the processor to cycle 21% faster than without variation.

...read moreread less

Abstract: Parameter variation in integrated circuits causes sections of a chip to be slower than others. If, to prevent any resulting timing errors, we design processors for worst-case parameter values, we may lose substantial performance. An alternate approach explored in this paper is to design for closer to nominal values, and provide some transistor budget to tolerate unavoidable variation-induced errors. To assess this approach, this paper first presents a novel framework that shows how microarchitecture techniques can trade off variation-induced errors for power and processor frequency. Then, the paper introduces an effective technique to maximize performance and minimize power in the presence of variation-induced errors, namely High-Dimensional dynamic adaptation. For efficiency, the technique is implemented using a machine-learning algorithm. The results show that our best configuration increases processor frequency by 56% on average, allowing the processor to cycle 21% faster than without variation. Processor performance increases by 40% on average, resulting in a performance that is 14% higher than without variation - at only a 10.6% area cost.

...read moreread less

98 citations

Proceedings Article•DOI•

Paceline: Improving Single-Thread Performance in Nanoscale CMPs through Core Overclocking

[...]

Brian Greskamp¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

15 Sep 2007

TL;DR: This paper presents the Paceline leader-checker microarchitecture, a leader core that runs the thread at higher-than-rated frequency, while passing execution hints and prefetches to a safely-clocked checker core in the same chip multiprocessor.

...read moreread less

Abstract: Under current worst-case design practices, manufacturers specify conservative values for processor frequencies in order to guarantee correctness. To recover some of the lost performance and improve single-thread performance, this paper presents the Paceline leader-checker microarchitecture. In Paceline, a leader core runs the thread at higher-than-rated frequency, while passing execution hints and prefetches to a safely-clocked checker core in the same chip multiprocessor. The checker redundantly executes the thread faster than without the leader, while checking the results to guarantee correctness. Leader and checker cores periodically swap functionality. The result is that the thread improves performance substantially without significantly increasing the power density or the hardware design complexity of the chip. By overclocking the leader by 30%, we estimate that Paceline improves SPECint and SPECfp performance by a geometric mean of 21% and 9%, respectively. Moreover, Paceline also provides tolerance to transient faults such as soft errors.

...read moreread less

89 citations

Proceedings Article•DOI•

Blueshift: Designing processors for timing speculation from the ground up.

[...]

Brian Greskamp¹, Lu Wan¹, Ulya R. Karpuzcu¹, Jeffrey J. Cook¹, Josep Torrellas¹, Deming Chen¹, Craig Zilles¹ - Show less +3 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

06 Mar 2009

TL;DR: This paper presents a new approach where the processor itself is designed from the ground up for Timing Speculation, and introduces two techniques that, when applied under BlueShift, improve processor performance: On-demand Selective Biasing (OSB) and Path Constraint Tuning (PCT).

...read moreread less

Abstract: Several recent processor designs have proposed to enhance performance by increasing the clock frequency to the point where timing faults occur, and by adding error-correcting support to guarantee correctness. However, such Timing Speculation (TS) proposals are limited in that they assume traditional design methodologies that are suboptimal under TS. In this paper, we present a new approach where the processor itself is designed from the ground up for TS. The idea is to identify and optimize the most frequently-exercised critical paths in the design, at the expense of the majority of the static critical paths, which are allowed to suffer timing errors. Our approach and design optimization algorithm are called BlueShift. We also introduce two techniques that, when applied under BlueShift, improve processor performance: On-demand Selective Biasing (OSB) and Path Constraint Tuning (PCT). Our evaluation with modules from the OpenSPARC T1 processor shows that, compared to conventional TS, BlueShift with OSB speeds up applications by an average of 8% while increasing the processor power by an average of 12%. Moreover, compared to a high-performance TS design, BlueShift with PCT speeds up applications by an average of 6% with an average processor power overhead of 23% . providing a way to speed up logic modules that is orthogonal to voltage scaling.

...read moreread less

89 citations

Proceedings Article•DOI•

The BubbleWrap many-core: popping cores for sequential acceleration

[...]

Ulya R. Karpuzcu¹, Brian Greskamp¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

12 Dec 2009

TL;DR: This paper proposes Dynamic Voltage Scaling for Aging Management (DVSAM) - a new scheme for managing processor aging to attain higher performance or lower power consumption and introduces the BubbleWrap many-core, a novel architecture that makes extensive use of DVSAM.

...read moreread less

Abstract: Many-core scaling now faces a power wall. The gap between the number of cores that fit on a die and the number that can operate simultaneously under the power budget is rapidly increasing with technology scaling. In future designs, many of the cores may have to be dormant at any given time to meet the power budget. To push back the many-core power wall, this paper proposes Dynamic Voltage Scaling for Aging Management (DVSAM) --- a new scheme for managing processor aging to attain higher performance or lower power consumption. In addition, this paper introduces the BubbleWrap many-core, a novel architecture that makes extensive use of DVSAM. BubbleWrap identifies the most power-efficient set of cores in a variation-affected chip --- the largest set that can be simultaneously powered-on --- and designates them as Throughput cores dedicated to parallel-section execution. The rest of the cores are designated as Expendable and are dedicated to accelerating sequential sections. BubbleWrap attains maximum sequential acceleration by sacrificing Expendable cores one at a time, running them at elevated supply voltage for a significantly shorter service life each, until they completely wear-out and are discarded --- figuratively, as if popping bubbles in bubble wrap that protects Throughput cores. In simulated 32-core chips, BubbleWrap provides substantial improvements over a plain chip. For example, on average, one design runs fully-sequential applications at a 16% higher frequency, and fully-parallel ones with a 30% higher throughput.

...read moreread less

75 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Statistics for Spatial Data.

[...]

Andrew B. Lawson¹, Noel A Cressie•Institutions (1)

University of Dundee¹

01 Mar 1993-The Statistician

6,278 citations

DOI•

International Technology Roadmap for Semiconductors 2003の要求清浄度について－シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について－

[...]

飯田裕幸, 竹田菊男, 藤本武利

20 Sep 2004

1,387 citations

Journal Article•DOI•

VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects

[...]

Smruti R. Sarangi, Brian Greskamp¹, Radu Teodorescu¹, Jun Nakano¹, Abhishek Tiwari¹, Josep Torrellas¹ - Show less +2 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

07 Feb 2008-IEEE Transactions on Semiconductor Manufacturing

...read moreread less

386 citations

Journal Article•DOI•

Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors

[...]

Radu Teodorescu¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jun 2008

TL;DR: In a 20-core CMP, the combination of variation-aware application scheduling and LinOpt increases the average throughput by 12-17% and reduces the average ED2 by 30-38% - all relative to using variation- aware scheduling together with a simple extension to Intel's Foxton power management algorithm.

...read moreread less

Abstract: Within-die process variation causes individual cores in a ChipMultiprocessor (CMP) to differ substantially in both static powerconsumed and maximum frequency supported. In this environment,ignoring variation effects whenscheduling applications or when managing power withDynamic Voltage and Frequency Scaling (DVFS) is suboptimal. This paper proposes variation-aware algorithms for applicationscheduling and power management. One such power managementalgorithm, called {\em LinOpt}, uses linear programmingto find the best voltage and frequency levels for each of thecores in the CMP --- maximizing throughput at a given power budget.In a 20-core CMP, the combination of variation-awareapplication scheduling and {\em LinOpt} increases the averagethroughput by 12--17\% and reduces the average $ED^2$ by 30--38\%--- all relative to using variation-awarescheduling together with a simple extension to Intel's Foxtonpower management algorithm.

...read moreread less

351 citations

Proceedings Article•DOI•

PRES: probabilistic replay with execution sketching on multiprocessors

[...]

Soyeon Park¹, Yuanyuan Zhou¹, Weiwei Xiong², Zuoning Yin², Rini T. Kaushik², Kyu H. Lee³, Shan Lu⁴ - Show less +3 more•Institutions (4)

University of California, San Diego¹, University of Illinois at Urbana–Champaign², Purdue University³, University of Wisconsin-Madison⁴

11 Oct 2009

TL;DR: A novel technique called PRES (probabilistic replay via execution sketching) is proposed to help reproduce concurrency bugs on multi-processors and significantly lowered the production-run recording overhead of previous approaches, while still reproducing most tested bugs in fewer than 10 replay attempts.

...read moreread less

Abstract: Bug reproduction is critically important for diagnosing a production-run failure. Unfortunately, reproducing a concurrency bug on multi-processors (e.g., multi-core) is challenging. Previous techniques either incur large overhead or require new non-trivial hardware extensions.This paper proposes a novel technique called PRES (probabilistic replay via execution sketching) to help reproduce concurrency bugs on multi-processors. It relaxes the past (perhaps idealistic) objective of "reproducing the bug on the first replay attempt" to significantly lower production-run recording overhead. This is achieved by (1) recording only partial execution information (referred to as "sketches") during the production run, and (2) relying on an intelligent replayer during diagnosis time (when performance is less critical) to systematically explore the unrecorded non-deterministic space and reproduce the bug. With only partial information, our replayer may require more than one coordinated replay run to reproduce a bug. However, after a bug is reproduced once, PRES can reproduce it every time.We implemented PRES along with five different execution sketching mechanisms. We evaluated them with 11 representative applications, including 4 servers, 3 desktop/client applications, and 4 scientific/graphics applications, with 13 real-world concurrency bugs of different types, including atomicity violations, order violations and deadlocks. PRES (with synchronization or system call sketching) significantly lowered the production-run recording overhead of previous approaches (by up to 4416 times), while still reproducing most tested bugs in fewer than 10 replay attempts. Moreover, PRES scaled well with the number of processors; PRES's feedback generation from unsuccessful replays is critical in bug reproduction.

...read moreread less

277 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144

Collapse