Home
/
Authors
/
Greg Ruhl

Author

Greg Ruhl

Bio: Greg Ruhl is an academic researcher from Intel. The author has contributed to research in topics: CMOS & Floating-point unit. The author has an hindex of 4, co-authored 6 publications receiving 1110 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS

[...]

Sriram R. Vangal¹, Jason Howard¹, Greg Ruhl¹, Saurabh Dighe¹, H. Wilson¹, James W. Tschanz¹, D. Finan¹, A. Singh¹, Tiju Jacob¹, Shailendra Jain¹, Vasantha Erraguntla¹, Clark Roberts¹, Yatin Hoskote¹, Nitin Borkar¹, Shekhar Borkar¹ - Show less +11 more•Institutions (1)

Intel¹

28 Jan 2008-IEEE Journal of Solid-state Circuits

TL;DR: In this paper, an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz.

...read moreread less

Abstract: This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.

...read moreread less

645 citations

Proceedings Article•DOI•

The 48-core SCC Processor: the Programmer's View

[...]

Timothy G. Mattson¹, Michael Riepen¹, Thomas Lehnig, Paul Brett¹, Werner Haas¹, Patrick Kennedy, Jason Howard¹, Sriram R. Vangal¹, Nitin Borkar¹, Greg Ruhl¹, Saurabh Dighe¹ - Show less +7 more•Institutions (1)

Intel¹

13 Nov 2010

TL;DR: The programmer's view of this chip is described and RCCE is described: the native message passing model created for the SCC processor, an intermediate case, sharing traits of message passing and shared memory architectures.

...read moreread less

Abstract: The number of cores integrated onto a single die is expected to climb steadily in the foreseeable future. This move to many-core chips is driven by a need to optimize performance per watt. How best to connect these cores and how to program the resulting many-core processor, however, is an open research question. Designs vary from GPUs to cache-coherent shared memory multiprocessors to pure distributed memory chips. The 48-core SCC processor reported in this paper is an intermediate case, sharing traits of message passing and shared memory architectures. The hardware has been described elsewhere. In this paper, we describe the programmer's view of this chip. In particular we describe RCCE: the native message passing model created for the SCC processor.

...read moreread less

267 citations

Proceedings Article•DOI•

A 280mV-to-1.2V wide-operating-range IA-32 processor in 32nm CMOS

[...]

Shailendra Jain¹, Surhud Khare¹, Satish Yada¹, V Ambili¹, Praveen Salihundam¹, Shiva Ramani¹, Sriram Muthukumar¹, Manali Ramakrishnan Srinivasan¹, Arun Kumar¹, Shasi Kumar Gb¹, Rajaraman Ramanarayanan¹, Vasantha Erraguntla¹, Jason Howard¹, Sriram R. Vangal¹, Saurabh Dighe¹, Greg Ruhl¹, Paolo Aseron¹, H. Wilson¹, Nitin Borkar¹, Vivek De¹, Shekhar Borkar¹ - Show less +17 more•Institutions (1)

Intel¹

03 Apr 2012

TL;DR: An IA-32 processor fabricated in 32nm CMOS technology is described, demonstrating a reliable ultra-low voltage operation and energy efficient performance across the wide voltage range from 280mV to 1.2V.

...read moreread less

Abstract: Near-threshold computing brings the promise of an order of magnitude improvement in energy efficiency over the current generation of microprocessors [1]. However, frequency degradation due to aggressive voltage scaling may not be acceptable across all single-threaded or performance-constrained applications. Enabling the processor to operate over a wide voltage range helps to achieve best possible energy efficiency while satisfying varying performance demands of the applications. This paper describes an IA-32 processor fabricated in 32nm CMOS technology [2], demonstrating a reliable ultra-low voltage operation and energy efficient performance across the wide voltage range from 280mV to 1.2V.

...read moreread less

216 citations

14.7 A 10GHz TCP Offload Accelerator for 10Gb/s Ethernet in 90nm Dual-VT CMOS

[...]

Yatin Hoskote, Vasantha Erraguntla, D. Finan, Jason Howard, Dan Klowden, Siva G. Narendra, Greg Ruhl, J. Tschanz, Sriram R. Vangal, V. Veeramachaneni, H. Wilson, Jianping Xu, Nitin Borkar - Show less +9 more

01 Jan 2003

7 citations

Proceedings Article•

2 GHz 2 Mb 2T Gain Cell Memory Macro With 128 GBytes/sec Bandwidth in a 65 nm Logic Process Technology

[...]

Dinesh Somasekhar, Yibin Daleye, Paolo Aseron¹, Shih-Lien Lu¹, Muhammad M. Khellah¹, Jason Howard¹, Greg Ruhl¹, Tanay Karnik¹, Shekhar Borkar¹, Vivek De¹, Ali Keshavarzi - Show less +7 more•Institutions (1)

Intel¹

01 Jan 2009

TL;DR: In this paper, the authors present a 2 Mb 2T PMOS gain cell macro on 65 nm logic process that has high bandwidth of 128 GBytes/sec, fast cycle time of 2 ns and 6-clock access time at 2 GHz.

...read moreread less

Abstract: -We present 2 Mb 2T PMOS gain cell macro on 65 nm logic process that has high bandwidth of 128 GBytes/sec, fast cycle time of 2 ns and 6-clock cycles access time at 2 GHz. Macro features a full-rate pipelined architecture, ground precharge bitline, nondestructive read-out, partial write support and 128-row refresh to tolerate short refresh time. Cell is 2X denser than SRAM and is voltage compatible with logic.

...read moreread less

4 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Device Requirements for Optical Interconnects to Silicon Chips

[...]

David A. B. Miller¹•Institutions (1)

Stanford University¹

10 Jun 2009

TL;DR: The current performance and future demands of interconnects to and on silicon chips are examined and the requirements for optoelectronic and optical devices are project if optics is to solve the major problems of interConnects for future high-performance silicon chips.

...read moreread less

Abstract: We examine the current performance and future demands of interconnects to and on silicon chips. We compare electrical and optical interconnects and project the requirements for optoelectronic and optical devices if optics is to solve the major problems of interconnects for future high-performance silicon chips. Optics has potential benefits in interconnect density, energy, and timing. The necessity of low interconnect energy imposes low limits especially on the energy of the optical output devices, with a ~ 10 fJ/bit device energy target emerging. Some optical modulators and radical laser approaches may meet this requirement. Low (e.g., a few femtofarads or less) photodetector capacitance is important. Very compact wavelength splitters are essential for connecting the information to fibers. Dense waveguides are necessary on-chip or on boards for guided wave optical approaches, especially if very high clock rates or dense wavelength-division multiplexing (WDM) is to be avoided. Free-space optics potentially can handle the necessary bandwidths even without fast clocks or WDM. With such technology, however, optics may enable the continued scaling of interconnect capacity required by future chips.

...read moreread less

1,959 citations

Book•

FPGA Architecture: Survey and Challenges

[...]

Ian Kuon¹, Russell Tessier², Jonathan Rose¹•Institutions (2)

University of Toronto¹, University of Massachusetts Amherst²

18 Apr 2008

TL;DR: This survey reviews the historical development of programmable logic devices, the fundamental programming technologies that the programmability is built on, and then describes the basic understandings gleaned from research on architectures.

...read moreread less

Abstract: Field-Programmable Gate Arrays (FPGAs) have become one of the key digital circuit implementation media over the last decade. A crucial part of their creation lies in their architecture, which governs the nature of their programmable logic functionality and their programmable interconnect. FPGA architecture has a dramatic effect on the quality of the final device's speed performance, area efficiency, and power consumption. This survey reviews the historical development of programmable logic devices, the fundamental programming technologies that the programmability is built on, and then describes the basic understandings gleaned from research on architectures. We include a survey of the key elements of modern commercial FPGA architecture, and look toward future trends in the field.

...read moreread less

491 citations

Proceedings Article•DOI•

Firefly: illuminating future network-on-chip with nanophotonics

[...]

Yan Pan¹, Prabhat Kumar¹, John Kim², Gokhan Memik¹, Yu Zhang¹, Alok Choudhary¹ - Show less +2 more•Institutions (2)

Northwestern University¹, KAIST²

20 Jun 2009

TL;DR: Firefly is a hybrid, hierarchical network architecture that consists of clusters of nodes that are connected using conventional, electrical signaling while the inter-cluster communication is done using nanophotonics - exploiting the benefits of electrical signaling for short, local communication while nanophotinics is used only for global communication to realize an efficient on-chip network.

...read moreread less

Abstract: Future many-core processors will require high-performance yet energy-efficient on-chip networks to provide a communication substrate for the increasing number of cores. Recent advances in silicon nanophotonics create new opportunities for on-chip networks. To efficiently exploit the benefits of nanophotonics, we propose Firefly - a hybrid, hierarchical network architecture. Firefly consists of clusters of nodes that are connected using conventional, electrical signaling while the inter-cluster communication is done using nanophotonics - exploiting the benefits of electrical signaling for short, local communication while nanophotonics is used only for global communication to realize an efficient on-chip network. Crossbar architecture is used for inter-cluster communication. However, to avoid global arbitration, the crossbar is partitioned into multiple, logical crossbars and their arbitration is localized. Our evaluations show that Firefly improves the performance by up to 57% compared to an all-electrical concentrated mesh (CMESH) topology on adversarial traffic patterns and up to 54% compared to an all-optical crossbar (OP XBAR) on traffic patterns with locality. If the energy-delay-product is compared, Firefly improves the efficiency of the on-chip network by up to 51% and 38% compared to CMESH and OP XBAR, respectively.

...read moreread less

411 citations

Proceedings Article•DOI•

Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse

[...]

Michael Taylor¹•Institutions (1)

University of California, San Diego¹

03 Jun 2012

TL;DR: Four key approaches are discussed - the four horsemen - that have emerged as top contenders for thriving in the dark silicon age and each class carries with its virtues deep-seated restrictions that requires a careful understanding of the underlying tradeoffs and benefits.

...read moreread less

Abstract: Due to the breakdown of Dennardian scaling, the percentage of a silicon chip that can switch at full frequency is dropping exponentially with each process generation. This utilization wall forces designers to ensure that, at any point in time, large fractions of their chips are effectively dark or dim silicon, i.e., either idle or significantly underclocked. As exponentially larger fractions of a chip's transistors become dark, silicon area becomes an exponentially cheaper resource relative to power and energy consumption. This shift is driving a new class of architectural techniques that “spend” area to “buy” energy efficiency. All of these techniques seek to introduce new forms of heterogeneity into the computational stack. We envision that ultimately we will see widespread use of specialized architectures that leverage these techniques in order to attain orders-of-magnitude improvements in energy efficiency. However, many of these approaches also suffer from massive increases in complexity. As a result, we will need to look towards developing pervasively specialized architectures that insulate the hardware designer and the programmer from the underlying complexity of such systems. In this paper, I discuss four key approaches — the four horsemen — that have emerged as top contenders for thriving in the dark silicon age. Each class carries with its virtues deep-seated restrictions that requires a careful understanding of the underlying tradeoffs and benefits.

...read moreread less

334 citations

Journal Article•DOI•

Cascaded Microresonator-Based Matrix Switch for Silicon On-Chip Optical Interconnection

[...]

Andrew Wing On Poon¹, Xianshu Luo¹, Fang Xu¹, Hui Chen¹•Institutions (1)

Hong Kong University of Science and Technology¹

16 Jun 2009

TL;DR: This paper emphasizes the recently proposed 5 times 5 matrix switch comprising two-dimensionally cascaded microring resonator-based electrooptic switches coupled to a waveguide cross-grid on a silicon chip, and studies the feasibility of large-scale integration of the matrix switch.

...read moreread less

Abstract: This paper reviews developments in cascaded microresonator-based matrix switches for silicon photonic interconnection networks in many-core computing applications. Specifically, we emphasize our recently proposed 5 times 5 matrix switch comprising two-dimensionally cascaded microring resonator-based electrooptic switches coupled to a waveguide cross-grid on a silicon chip. The cross-grid adopts low-loss low-crosstalk multimode-interference-based waveguide crossings. Such a microresonator-based matrix switch offers nonblocking interconnections among multiple inputs and multiple outputs, with the key merits of i) a tens to hundreds of micrometers-scale footprint, ii) gigabit/second-scale data transmission, iii) nanosecond-speed circuit-switching, iv) 100-muW-scale DC power consumption per link, and v) large-scale integration for networks-on-chips applications. We analyze in detail the microring resonator-based cross-grid switch design for high-data-rate signal transmission in the context of our proposed 5 times 5 matrix switch. We also study the feasibility of large-scale integration of the matrix switch. We report proof-of-concept experiments of a single cross-grid switch element and a 2 times 2 matrix switch, propose design guidelines, and discuss future engineering challenges.

...read moreread less

272 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse