Home
/
Authors
/
David William Boerstler

Author

David William Boerstler

Bio: David William Boerstler is an academic researcher from IBM. The author has contributed to research in topics: Phase-locked loop & Clock signal. The author has an hindex of 24, co-authored 158 publications receiving 2423 citations.

Topics: Phase-locked loop, Clock signal, Signal, Duty cycle, Clock domain crossing ...read more

Papers published on a yearly basis

2015
2014
2013
2010
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1994
1993
1992
1991
1990

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A clock distribution network for microprocessors

[...]

Phillip J. Restle¹, Timothy G. McNamara¹, David A. Webber¹, Peter J. Camporese¹, K.F. Eng¹, Keith A. Jenkins¹, D.H. Allen², M.J. Rohn², M.P. Quaranta², David William Boerstler¹, Charles J. Alpert¹, C.A. Carter¹, R.N. Bailey³, J.G. Petrovick¹, Byron L. Krauter¹, Bradley McCredie¹ - Show less +12 more•Institutions (3)

IBM¹, University of Rochester², Agere Systems³

01 May 2001-IEEE Journal of Solid-state Circuits

TL;DR: A global clock distribution strategy implemented on several microprocessor chips is described, which consists of buffered, tunable tree networks, with the final trees all driving a common grid.

...read moreread less

Abstract: A global clock distribution strategy used on several microprocessor chips is described. The clock network consists of buffered tunable trees or treelike networks, with the final level of trees all driving a single common grid covering most of the chip. This topology combines advantages of both trees and grids. A new tuning method was required to efficiently tune such a large strongly connected interconnect network consisting of up to 6 m of wire and modeled with 50000 resistors, capacitors, and inductors. Variations are described to handle different floor-planning styles. Global clock skew as low as 22 ps on large microprocessor chips has been measured.

...read moreread less

311 citations

Journal Article•DOI•

Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor

[...]

D. Pham¹, T. Aipperspach², David William Boerstler¹, M. Bolliger¹, Rajat Chaudhry¹, D. Cox², P. Harvey¹, Paul Marlan Harvey¹, Harm Peter Hofstee¹, Charles Ray Johns¹, J. Kahle¹, Atsushi Kameyama³, J. Keaty¹, Y. Masubuchi³, Mydung Pham¹, J. Pille¹, S. Posluszny¹, Mack W. Riley¹, Daniel Lawrence Stasiak¹, Masakazu Suzuoki⁴, Osamu Takahashi¹, James D. Warnock¹, S. Weitzel¹, D. Wendel¹, Kazuaki Yazawa⁴ - Show less +21 more•Institutions (4)

IBM¹, University of Rochester², Toshiba³, Sony Computer Entertainment⁴

01 Jan 2006-IEEE Journal of Solid-state Circuits

TL;DR: In this paper, the design challenges that current and future processors must face, with stringent power limits, high-frequency targets, and the continuing system integration trends, are reviewed, and a first-generation Cell processor is described.

...read moreread less

Abstract: This paper reviews the design challenges that current and future processors must face, with stringent power limits, high-frequency targets, and the continuing system integration trends. This paper then describes the architecture, circuit design, and physical implementation of a first-generation Cell processor and the design techniques used to overcome the above challenges. A Cell processor consists of a 64-bit Power Architecture processor coupled with multiple synergistic processors, a flexible IO interface, and a memory interface controller that supports multiple operating systems including Linux. This multi-core SoC, implemented in 90-nm SOI technology, achieved a high clock rate by maximizing custom circuit design while maintaining reasonable complexity through design modularity and reuse.

...read moreread less

258 citations

Proceedings Article•DOI•

5.2 Distributed system of digitally controlled microregulators enabling per-core DVFS for the POWER8 TM microprocessor

[...]

Zeynep Toprak-Deniz¹, Michael A. Sperling¹, John F. Bulzacchelli¹, Gregory Scott Still¹, Ryan Kruse¹, Seongwon Kim¹, David William Boerstler¹, Tilman Gloekler¹, R. P. Robertazzi¹, Kevin Stawiasz¹, Timothy Diemoz¹, George English¹, David T. Hui¹, Paul H. Muench¹, Joshua Friedrich¹ - Show less +11 more•Institutions (1)

IBM¹

06 Mar 2014

TL;DR: This paper presents an iVRM system developed for the POWER8™ microprocessor, which functions as a very fast, accurate low-dropout regulator (LDO), with 90.5% peak power efficiency (only 3.1% worse than an ideal LDO).

...read moreread less

Abstract: Integrated voltage regulator modules (iVRMs) [1] provide a cost-effective path to realizing per-core dynamic voltage and frequency scaling (DVFS), which can be used to optimize the performance of a power-constrained multi-core processor. This paper presents an iVRM system developed for the POWER8™ microprocessor, which functions as a very fast, accurate low-dropout regulator (LDO), with 90.5% peak power efficiency (only 3.1% worse than an ideal LDO). At low output voltages, efficiency is reduced but still sufficient to realize beneficial energy savings with DVFS. Each iVRM features a bypass mode so that some of the cores can be operated at maximum performance with no regulator loss. With the iVRM area including the input decoupling capacitance (DCAP) (but not the output DCAP inherent to the cores), the iVRMs achieve a power density of 34.5W/mm2, which exceeds that of inductor-based or SC converters by at least 3.4× [2].

...read moreread less

119 citations

Journal Article•DOI•

A low-jitter PLL clock generator for microprocessors with lock range of 340-612 MHz

[...]

David William Boerstler¹•Institutions (1)

IBM¹

01 Apr 1999-IEEE Journal of Solid-state Circuits

TL;DR: A phase-locked loop (PLL) clock generator/phase aligner for the POWER3 microprocessor has been designed using a 2.5-V, 0.40-/spl mu/m digital CMOS6S process as discussed by the authors.

...read moreread less

Abstract: A fully integrated, phase-locked loop (PLL) clock generator/phase aligner for the POWER3 microprocessor has been designed using a 2.5-V, 0.40-/spl mu/m digital CMOS6S process. The PLL design supports multiple integer and noninteger frequency multiplication factors for both the processor clock and an L2 cache clock. The fully differential delay-interpolating voltage-controlled oscillator (VCO) is tunable over a frequency range determined by programmable frequency limit settings, enhancing yield and application flexibility. PLL lock range for the maximum VCO frequency range settings is 340-612 MHz. The charge-pump current is programmable for additional control of the PLL loop dynamics. A differential on-chip loop filter with common-mode correction improves noise rejection. Cycle-cycle jitter measurements with the microprocessor actively executing instructions were 10.0 ps rms, 80 ps peak to peak (P-P) measured from the clock tree. Cycle-cycle jitter measured for the processor in a reset state with the clock tree active was 8.4 ps rms, 62 ps P-P. PLL area is 1040/spl times/640 /spl mu/m/sup 2/. Power dissipation is <100 mW.

...read moreread less

99 citations

Journal Article•DOI•

The 12-Core POWER8™ Processor With 7.6 Tb/s IO Bandwidth, Integrated Voltage Regulation, and Resonant Clocking

[...]

Eric Fluhr¹, Steve Baumgartner², David William Boerstler¹, John F. Bulzacchelli¹, Timothy Diemoz¹, Daniel M. Dreps¹, George English¹, Joshua Friedrich¹, Anne E. Gattiker¹, Tilman Gloekler¹, Christopher Gonzalez¹, Jason D. Hibbeler¹, Keith A. Jenkins¹, Yong Kim¹, Paul H. Muench¹, Ryan Nett¹, Jose Angel Paredes¹, Juergen Pille¹, Donald W. Plass¹, Phillip J. Restle¹, R. P. Robertazzi¹, David Shan¹, David W. Siljenberg², Michael A. Sperling¹, Kevin Stawiasz¹, Gregory Scott Still¹, Zeynep Toprak-Deniz¹, James D. Warnock¹, Glen A. Wiedemeier¹, Victor Zyuban¹ - Show less +26 more•Institutions (2)

IBM¹, University of Rochester²

01 Jan 2015-IEEE Journal of Solid-state Circuits

TL;DR: POWER8™ is a 12-core processor fabricated in IBM's 22 nm SOI technology with core and cache improvements driven by big data applications, providing 2.5× socket performance over POWER7+™, and power efficiency is improved with several techniques.

...read moreread less

Abstract: POWER8™ is a 12-core processor fabricated in IBM's 22 nm SOI technology with core and cache improvements driven by big data applications, providing 25× socket performance over POWER7+™ Core throughput is supported by 76 Tb/s of off-chip I/O bandwidth which is provided by three primary interfaces, including two new variants of Elastic Interface as well as embedded PCI Gen-3 Power efficiency is improved with several techniques An on-chip controller based on an embedded PowerPC™ 405 processor applies per-core DVFS by adjusting DPLLs and fully integrated voltage regulators Each voltage regulator is a highly distributed system of digitally controlled microregulators, which achieves a peak power efficiency of 905% A wide frequency range resonant clock design is used in 13 clock meshes and demonstrates a minimum power savings of 4% Power and delay efficiency is achieved through the use of pulsed-clock latches, which require statistical validation to ensure robust yield

...read moreread less

50 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A 5-GHz Mesh Interconnect for a Teraflops Processor

[...]

Y. Hoskote¹, Sriram R. Vangal¹, A. Singh¹, Nitin Borkar¹, S. Borkar¹ - Show less +1 more•Institutions (1)

Intel¹

01 Sep 2007-IEEE Micro

TL;DR: A multicore processor in 65-Nm technology with 80 single-precision, floatingpoint cores delivers performance in excess of a Teraflops while consuming less than 100 W.

...read moreread less

Abstract: A multicore processor in 65-Nm technology with 80 single-precision, floatingpoint cores delivers performance in excess of a Teraflops while consuming less than 100 W. A 2D on-die mesh interconnection network operating at 5 GHz provides the high-performance communication fabric to connect the cores. The network delivers a bisection bandwidth of 2.56 Terabits per second and a per hop fall-through latency of 1 nanosecond.

...read moreread less

658 citations

Proceedings Article•DOI•

DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling

[...]

Chen Sun¹, Chia-Hsin Owen Chen¹, George Kurian¹, Lan Wei¹, Jason Miller¹, Anant Agarwal¹, Li-Shiuan Peh¹, Vladimir Stojanovic¹ - Show less +4 more•Institutions (1)

Massachusetts Institute of Technology¹

09 May 2012

TL;DR: DSENT, a NoC modeling tool for rapid design space exploration of electrical and opto-electrical networks, is presented and the results show the implications of different technology scenarios and the need to reduce laser and thermal tuning power in a photonic network due to their non-data-dependent nature.

...read moreread less

Abstract: With the rise of many-core chips that require substantial bandwidth from the network on chip (NoC), integrated photonic links have been investigated as a promising alternative to traditional electrical interconnects While numerous opto-electronic NoCs have been proposed, evaluations of photonic architectures have thus-far had to use a number of simplifications, reflecting the need for a modeling tool that accurately captures the tradeoffs for the emerging technology and its impacts on the overall network In this paper, we present DSENT, a NoC modeling tool for rapid design space exploration of electrical and opto-electrical networks We explain our modeling framework and perform an energy-driven case study, focusing on electrical technology scaling, photonic parameters, and thermal tuning Our results show the implications of different technology scenarios and, in particular, the need to reduce laser and thermal tuning power in a photonic network due to their non-data-dependent nature

...read moreread less

529 citations

Proceedings Article•DOI•

Regional congestion awareness for load balance in networks-on-chip

[...]

Paul V. Gratz¹, Boris Grot¹, Stephen W. Keckler¹•Institutions (1)

University of Texas at Austin¹

24 Oct 2008

TL;DR: Regional Congestion Awareness (RCA) is proposed, a lightweight technique to improve global network balance that informs the routing policy of congestion in parts of the network beyond adjacent routers.

...read moreread less

Abstract: Interconnection networks-on-chip (NOCs) are rapidly replacing other forms of interconnect in chip multiprocessors and system-on-chip designs. Existing interconnection networks use either oblivious or adaptive routing algorithms to determine the route taken by a packet to its destination. Despite somewhat higher implementation complexity, adaptive routing enjoys better fault tolerance characteristics, increases network throughput, and decreases latency compared to oblivious policies when faced with non-uniform or bursty traffic. However, adaptive routing can hurt performance by disturbing any inherent global load balance through greedy local decisions. To improve load balance in adapting routing, we propose Regional Congestion Awareness (RCA), a lightweight technique to improve global network balance. Instead of relying solely on local congestion information, RCA informs the routing policy of congestion in parts of the network beyond adjacent routers. Our experiments show that RCA matches or exceeds the performance of conventional adaptive routing across all workloads examined, with a 16% average and 71% maximum latency reduction on SPLASH-2 benchmarks running on a 49-core CMP. Compared to a baseline adaptive router, RCA incurs a negligible logic and modest wiring overhead.

...read moreread less

409 citations

Patent•

Method and system for providing advertising listing variance in distribution feeds over the internet to maximize revenue to the advertising distributor

[...]

Eric Robert Bronnimann¹, Jacob Paul Ewerdt¹, William C. Day¹, Kevin C Donovan¹, Brian Hammond¹, Ron Mccoy¹, Christopher Joseph Murphy¹, James Keith Toothman¹, Wen-Wei Wang¹ - Show less +5 more•Institutions (1)

Murphy Oil¹

05 Mar 2003

TL;DR: In this article, an Internet advertisement listings provider distributes advertisements in a bid-for-placement arrangement based on the revenue-efficiency of the advertisements from the bidding advertisers that calculates the revenue to the advertising distribution system by multiplying the click-through rate times the bid amount for each clickthrough.

...read moreread less

Abstract: An Internet advertisement listings provider that distributes advertisements in a bid-for-placement arrangement based on the revenue-efficiency of the advertisements from the bidding advertisers that calculates the revenue to the advertising distribution system by multiplying the click-through rate times the bid amount for each click-through. Advertisers may be allowed to provide multiple advertisements to enable the advertisement listings provider to select from those various advertisements for inclusion in ranked listings based on a determined efficiency among the advertisements. The system also determines the most efficient grouping of advertisements for a limited-space output, comparing groupings of advertisements to other groups to determine the greater revenue to the distribution system.

...read moreread less

397 citations

Journal Article•DOI•

A 5-GHz Mesh Interconnect for a Teraflops Processor

[...]

Hoskote, Vangal, Singh, Borkar

01 Jan 2007-IEEE Micro

367 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse