Home
/
Authors
/
H. Wilson

Author

H. Wilson

Bio: H. Wilson is an academic researcher from Intel. The author has contributed to research in topics: CMOS & Router. The author has an hindex of 15, co-authored 21 publications receiving 2786 citations.

Topics: CMOS, Router, Network packet, Ethernet, Clock gating ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS

[...]

Sriram R. Vangal¹, Jason Howard¹, G. Ruhl¹, Saurabh Dighe¹, H. Wilson¹, J. Tschanz¹, D. Finan¹, P. Iyer¹, A. Singh¹, Tiju Jacob¹, Shailendra Jain¹, S. Venkataraman¹, Y. Hoskote¹, Nitin Borkar¹ - Show less +10 more•Institutions (1)

Intel¹

18 Jun 2007

TL;DR: A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10 times 8 2D array of floating-point cores and packet-switched routers, operating at 4GHz, designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.

...read moreread less

Abstract: A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10 times 8 2D array of floating-point cores and packet-switched routers, operating at 4GHz. The 15-F04 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. The 65nm 100M transistor die is designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.

...read moreread less

730 citations

Proceedings Article•DOI•

A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS

[...]

Jason Howard¹, Saurabh Dighe¹, Yatin Hoskote¹, Sriram R. Vangal¹, D. Finan¹, G. Ruhl¹, David Jenkins¹, H. Wilson¹, Nitin Borkar¹, Gerhard Schrom¹, Fabrice Pailet¹, Shailendra Jain¹, Tiju Jacob¹, Satish Yada¹, Sravan K. Marella¹, Praveen Salihundam¹, Vasantha Erraguntla¹, Michael Konow¹, Michael Riepen¹, Guido Droege¹, Joerg Lindemann¹, Matthias Gries¹, Thomas Apel¹, Kersten Henriss¹, Tor Lund-Larsen¹, Sebastian Steibl¹, Shekhar Borkar¹, Vivek De¹, Rob F. Van der Wijngaart¹, Timothy G. Mattson² - Show less +26 more•Institutions (2)

Intel¹, DuPont²

18 Mar 2010

TL;DR: This paper presents a prototype chip that integrates 48 Pentium™ class IA-32 cores on a 6×4 2D-mesh network of tiled core clusters with high-speed I/Os on the periphery to realize a data-center-on-a-die microprocessor architecture.

...read moreread less

Abstract: Current developments in microprocessor design favor increased core counts over frequency scaling to improve processor performance and energy efficiency. Coupling this architectural trend with a message-passing protocol helps realize a data-center-on-a-die. The prototype chip (Figs. 5.7.1 and 5.7.7) described in this paper integrates 48 Pentium™ class IA-32 cores [1] on a 6×4 2D-mesh network of tiled core clusters with high-speed I/Os on the periphery. The chip contains 1.3B transistors. Each core has a private 256KB L2 cache (12MB total on-die) and is optimized to support a message-passing-programming model whereby cores communicate through shared memory. A 16KB message-passing buffer (MPB) is present in every tile, giving a total of 384KB on-die shared memory, for increased performance. Power is kept at a minimum by transmitting dynamic, fine-grained voltage-change commands over the network to an on-die voltage-regulator controller (VRC). Further power savings are achieved through active frequency scaling at the tile granularity. Memory accesses are distributed over four on-die DDR3 controllers for an aggregate peak memory bandwidth of 21GB/s at 4× burst. Additionally, an 8-byte bidirectional system interface (SIF) provides 6.4GB/s of I/O bandwidth. The die area is 567mm2 and is implemented in 45nm high-к metal-gate CMOS [2].

...read moreread less

672 citations

Journal Article•DOI•

An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS

[...]

Sriram R. Vangal¹, Jason Howard¹, Greg Ruhl¹, Saurabh Dighe¹, H. Wilson¹, James W. Tschanz¹, D. Finan¹, A. Singh¹, Tiju Jacob¹, Shailendra Jain¹, Vasantha Erraguntla¹, Clark Roberts¹, Yatin Hoskote¹, Nitin Borkar¹, Shekhar Borkar¹ - Show less +11 more•Institutions (1)

Intel¹

28 Jan 2008-IEEE Journal of Solid-state Circuits

TL;DR: In this paper, an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz.

...read moreread less

Abstract: This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.

...read moreread less

645 citations

Proceedings Article•DOI•

Adaptive Frequency and Biasing Techniques for Tolerance to Dynamic Temperature-Voltage Variations and Aging

[...]

J. Tschanz¹, Nam Sung Kim¹, Saurabh Dighe¹, Jason Howard¹, G. Ruhl¹, S. Vanga¹, Siva G. Narendra, Y. Hoskote¹, H. Wilson¹, C. Lam¹, M. Shuman², Carlos Tokunaga³, Dinesh Somasekhar¹, Stephen H. Tang¹, D. Finan¹, Tanay Karnik¹, Nitin Borkar¹, Nasser A. Kurd¹, Vivek De¹ - Show less +15 more•Institutions (3)

Intel¹, Oregon State University², University of Michigan³

18 Jun 2007

TL;DR: Temperature, voltage, and current sensors monitor the operation of a TCP/IP offload accelerator engine fabricated in 90nm CMOS, and a control unit dynamically changes frequency, Voltage, and body bias for optimum performance and energy efficiency.

...read moreread less

Abstract: Temperature, voltage, and current sensors monitor the operation of a TCP/IP offload accelerator engine fabricated in 90nm CMOS, and a control unit dynamically changes frequency, voltage, and body bias for optimum performance and energy efficiency. Fast response to droops and temperature changes is enabled by a multi-PLL clocking unit and on-chip body bias. Adaptive techniques are also used to compensate performance degradation due to device aging, reducing the aging guardband.

...read moreread less

233 citations

Proceedings Article•DOI•

A 280mV-to-1.2V wide-operating-range IA-32 processor in 32nm CMOS

[...]

Shailendra Jain¹, Surhud Khare¹, Satish Yada¹, V Ambili¹, Praveen Salihundam¹, Shiva Ramani¹, Sriram Muthukumar¹, Manali Ramakrishnan Srinivasan¹, Arun Kumar¹, Shasi Kumar Gb¹, Rajaraman Ramanarayanan¹, Vasantha Erraguntla¹, Jason Howard¹, Sriram R. Vangal¹, Saurabh Dighe¹, Greg Ruhl¹, Paolo Aseron¹, H. Wilson¹, Nitin Borkar¹, Vivek De¹, Shekhar Borkar¹ - Show less +17 more•Institutions (1)

Intel¹

03 Apr 2012

TL;DR: An IA-32 processor fabricated in 32nm CMOS technology is described, demonstrating a reliable ultra-low voltage operation and energy efficient performance across the wide voltage range from 280mV to 1.2V.

...read moreread less

Abstract: Near-threshold computing brings the promise of an order of magnitude improvement in energy efficiency over the current generation of microprocessors [1]. However, frequency degradation due to aggressive voltage scaling may not be acceptable across all single-threaded or performance-constrained applications. Enabling the processor to operate over a wide voltage range helps to achieve best possible energy efficiency while satisfying varying performance demands of the applications. This paper describes an IA-32 processor fabricated in 32nm CMOS technology [2], demonstrating a reliable ultra-low voltage operation and energy efficient performance across the wide voltage range from 280mV to 1.2V.

...read moreread less

216 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

[...]

Yu-Hsin Chen¹, Tushar Krishna¹, Joel Emer¹, Vivienne Sze¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2017-IEEE Journal of Solid-state Circuits

TL;DR: Eyeriss as mentioned in this paper is an accelerator for state-of-the-art deep convolutional neural networks (CNNs) that optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, by reconfiguring the architecture.

...read moreread less

Abstract: Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on-chip and off-chip that is more energy-consuming than computation. Minimizing data movement energy cost for any CNN shape, therefore, is the key to high throughput and energy efficiency. Eyeriss achieves these goals by using a proposed processing dataflow, called row stationary (RS), on a spatial architecture with 168 processing elements. RS dataflow reconfigures the computation mapping of a given shape, which optimizes energy efficiency by maximally reusing data locally to reduce expensive data movement, such as DRAM accesses. Compression and data gating are also applied to further improve energy efficiency. Eyeriss processes the convolutional layers at 35 frames/s and 0.0029 DRAM access/multiply and accumulation (MAC) for AlexNet at 278 mW (batch size $N = 4$ ), and 0.7 frames/s and 0.0035 DRAM access/MAC for VGG-16 at 236 mW ( $N = 3$ ).

...read moreread less

2,165 citations

Journal Article•DOI•

Device Requirements for Optical Interconnects to Silicon Chips

[...]

David A. B. Miller¹•Institutions (1)

Stanford University¹

10 Jun 2009

TL;DR: The current performance and future demands of interconnects to and on silicon chips are examined and the requirements for optoelectronic and optical devices are project if optics is to solve the major problems of interConnects for future high-performance silicon chips.

...read moreread less

Abstract: We examine the current performance and future demands of interconnects to and on silicon chips. We compare electrical and optical interconnects and project the requirements for optoelectronic and optical devices if optics is to solve the major problems of interconnects for future high-performance silicon chips. Optics has potential benefits in interconnect density, energy, and timing. The necessity of low interconnect energy imposes low limits especially on the energy of the optical output devices, with a ~ 10 fJ/bit device energy target emerging. Some optical modulators and radical laser approaches may meet this requirement. Low (e.g., a few femtofarads or less) photodetector capacitance is important. Very compact wavelength splitters are essential for connecting the information to fibers. Dense waveguides are necessary on-chip or on boards for guided wave optical approaches, especially if very high clock rates or dense wavelength-division multiplexing (WDM) is to be avoided. Free-space optics potentially can handle the necessary bandwidths even without fast clocks or WDM. With such technology, however, optics may enable the continued scaling of interconnect capacity required by future chips.

...read moreread less

1,959 citations

Proceedings Article•DOI•

Parameter variations and impact on circuits and microarchitecture

[...]

Shekhar Borkar¹, Tanay Karnik¹, Siva G. Narendra¹, James W. Tschanz¹, Ali Keshavarzi¹, Vivek De¹ - Show less +2 more•Institutions (1)

Intel¹

02 Jun 2003

TL;DR: Process, voltage and temperature variations; and their impact on circuit and microarchitecture; and possible solutions to reduce the impact of parameter variations and to achieve higher frequency bins are presented.

...read moreread less

Abstract: Parameter variation in scaled technologies beyond 90nm will pose a major challenge for design of future high performance microprocessors. In this paper, we discuss process, voltage and temperature variations; and their impact on circuit and microarchitecture. Possible solutions to reduce the impact of parameter variations and to achieve higher frequency bins are also presented.

...read moreread less

1,503 citations

Proceedings Article•DOI•

Thousand core chips: a technology perspective

[...]

Shekhar Borkar¹•Institutions (1)

Intel¹

04 Jun 2007

TL;DR: The many-core architecture, with hundreds to thousands of small cores, is presented to deliver unprecedented compute performance in an affordable power envelope and fine grain power management, memory bandwidth, on die networks, and system resiliency are discussed.

...read moreread less

Abstract: This paper presents the many-core architecture, with hundreds to thousands of small cores, to deliver unprecedented compute performance in an affordable power envelope. We discuss fine grain power management, memory bandwidth, on die networks, and system resiliency for the many-core system.

...read moreread less

961 citations

Proceedings Article•DOI•

The multikernel: a new OS architecture for scalable multicore systems

[...]

Andrew Baumann¹, Paul Barham², Pierre-Évariste Dagand³, Tim Harris², Rebecca Isaacs², Simon Peter¹, Timothy Roscoe¹, Adrian Schüpbach¹, Akhilesh Singhania¹ - Show less +5 more•Institutions (3)

ETH Zurich¹, Microsoft², École normale supérieure de Cachan³

11 Oct 2009

TL;DR: This work investigates a new OS structure, the multikernel, that treats the machine as a network of independent cores, assumes no inter-core sharing at the lowest level, and moves traditional OS functionality to a distributed system of processes that communicate via message-passing.

...read moreread less

Abstract: Commodity computer systems contain more and more processor cores and exhibit increasingly diverse architectural tradeoffs, including memory hierarchies, interconnects, instruction sets and variants, and IO configurations. Previous high-performance computing systems have scaled in specific cases, but the dynamic nature of modern client and server workloads, coupled with the impossibility of statically optimizing an OS for all workloads and hardware variants pose serious challenges for operating system structures.We argue that the challenge of future multicore hardware is best met by embracing the networked nature of the machine, rethinking OS architecture using ideas from distributed systems. We investigate a new OS structure, the multikernel, that treats the machine as a network of independent cores, assumes no inter-core sharing at the lowest level, and moves traditional OS functionality to a distributed system of processes that communicate via message-passing.We have implemented a multikernel OS to show that the approach is promising, and we describe how traditional scalability problems for operating systems (such as memory management) can be effectively recast using messages and can exploit insights from distributed systems and networking. An evaluation of our prototype on multicore systems shows that, even on present-day machines, the performance of a multikernel is comparable with a conventional OS, and can scale better to support future hardware.

...read moreread less

926 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse