Home
/
Authors
/
Shane L. Bell

Author

Shane L. Bell

Bio: Shane L. Bell is an academic researcher from Tilera. The author has contributed to research in topics: Digital clock manager & Reduced instruction set computing. The author has an hindex of 8, co-authored 12 publications receiving 1630 citations. Previous affiliations of Shane L. Bell include Hewlett-Packard.

Papers

PDF

Open Access

More filters

Processor: A 64-Core SoC with Mesh Interconnect

[...]

Shane L. Bell, Bruce S. Edwards, John Amann, Rich Conlin, Kevin Joyce, Vince Leung, John MacKay, Mike Reif, Liewei Bao, J.F. Brown, Matthew Mattina, Chyi-Chang Miao, Carl Ramey, David Wentzlaff, Walker Anderson, Ethan Berger, Nat Fairbanks, Durlov Khan, Froilan Montenegro, Jay Stickney, John Zook - Show less +17 more

01 Jan 2010

TL;DR: The TILE64TM processor as mentioned in this paper is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications, with 64 tile processors arranged in an 8x8 array.

...read moreread less

Abstract: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications. A figure shows a block diagram with 64 tile processors arranged in an 8x8 array. These tiles connect through a scalable 2D mesh network with high-speed I/Os on the periphery. Each general-purpose processor is identical and capable of running SMP Linux.

...read moreread less

634 citations

Proceedings Article•DOI•

TILE64 - Processor: A 64-Core SoC with Mesh Interconnect

[...]

Shane L. Bell, Bruce S. Edwards, John Amann, Richard Conlin, Kevin Joyce, V. Leung, J. MacKay, M. Reif, Liewei Bao, J.F. Brown, Matthew Mattina, Chyi-Chang Miao, Carl Ramey, David Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, J. Zook - Show less +17 more

01 Feb 2008

TL;DR: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications.

...read moreread less

587 citations

Patent•

Computing in parallel processing environments

[...]

Patrick Robert Griffin¹, Mathew Hostetter¹, Anant Agarwal¹, Chyi-Chang Miao¹, Christopher D. Metcalf¹, Bruce Edwards¹, Carl Ramey¹, Rosenbluth Mark B¹, David Wentzlaff¹, Christopher J. Jackson¹, Ben Harrison¹, Kenneth Steele¹, John Amann¹, Shane L. Bell¹, Richard Conlin¹, Kevin Joyce¹, Christine Deignan¹, Liewei Bao¹, Matthew Mattina¹, Ian Rudolf Bratt¹, Richard Schooler¹ - Show less +17 more•Institutions (1)

Tilera¹

11 Apr 2017

TL;DR: In this article, a processor is coupled to a communication network among the cores, and a switch in each core includes switching circuitry to forward data received over data paths from other cores to the processor and to switches of other cores.

...read moreread less

Abstract: A computing system comprises one or more cores. Each core comprises a processor. In some implementations, each processor is coupled to a communication network among the cores. In some implementations, a switch in each core includes switching circuitry to forward data received over data paths from other cores to the processor and to switches of other cores, and to forward data received from the processor to switches of other cores.

...read moreread less

202 citations

Proceedings Article•DOI•

Design of an 8-wide superscalar RISC microprocessor with simultaneous multithreading

[...]

R.P. Preston, R.W. Badeau, D.W. Bailey, Shane L. Bell, L.L. Biro, William J. Bowhill, D.E. Dever, S. Felix, R. Gammack, V. Germini, M.K. Gowan, Paul E. Gronowski, D.B. Jackson, Swati Mehta, S.V. Morton, J.D. Pickholtz, Matthew H. Reilly, M.J. Smith - Show less +14 more

07 Aug 2002

TL;DR: A 250M transistor microprocessor implements the Alpha instruction set and features 8-wide superscalar issue and simultaneous multithreading in a 0.125/spl mu/m SOI process.

...read moreread less

Abstract: A 250M transistor microprocessor implements the Alpha instruction set and features 8-wide superscalar issue and simultaneous multithreading in a 0.125/spl mu/m SOI process. Performance is estimated at over three times that of the previous design.

...read moreread less

92 citations

Journal Article•

Circuit implementation of a 300-MHz 64-bit second-generation CMOS Alpha CPU

[...]

William J. Bowhill, Shane L. Bell, B.J. Benschneider, Andrew J. Black, Sharon M. Britton, Ruben W. Castelino, Dale R. Donchin, John H. Edmondson, Harry R. Fair, Paul E. Gronowski, Anil K. Jain, Patricia L. Kroesen, Marc E. Lamere, Bruce J. Loughlin, Shekhar Mehata, Sribalan Santhanam, Timothy A. Shedd, Stephen C. Thierauf, Robert O. Mueller, R.P. Preston, Michael J. Smith - Show less +17 more

02 Jan 1995-Digital Technical Journal

TL;DR: A 300-MHz, custom 64-bit VLSI, second-generation Alpha CPU chip has been developed and can issue four instructions per cycle and delivers 1,200 mips/600 MFLOPS (peak).

...read moreread less

Abstract: A 300-MHz, custom 64-bit VLSI, second-generation Alpha CPU chip has been developed. The chip was designed in a 0.5-um CMOS technology using four levels of metal. The die size is 16.5 mm by 18.1 mm, contains 9.3 million transistors, operates at 3.3 V, and supports 3.3-V/5.0-V interfaces. Power dissipation is 50 W. It contains an 8-KB instruction cache; an 8-KB data cache; and a 96-KB unified second-level cache. The chip can issue four instructions per cycle and delivers 1,200 mips/600 MFLOPS (peak). Several noteworthy circuit and implementation techniques were used to attain the target operating frequency.

...read moreread less

83 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Wattch: a framework for architectural-level power analysis and optimizations

[...]

David Brooks¹, Vivek Tiwari², Margaret Martonosi¹•Institutions (2)

Princeton University¹, Intel²

01 May 2000

TL;DR: Wattch is presented, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level and opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.

...read moreread less

Abstract: Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete. In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities.This paper presents Wattch, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level. Wattch is 1000X or more faster than existing layout-level power tools, and yet maintains accuracy within 10% of their estimates as verified using industry tools on leading-edge designs. This paper presents several validations of Wattch's accuracy. In addition, we present three examples that demonstrate how architects or compiler writers might use Wattch to evaluate power consumption in their design process.We see Wattch as a complement to existing lower-level tools; it allows architects to explore and cull the design space early on, using faster, higher-level tools. It also opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.

...read moreread less

2,848 citations

Journal Article•DOI•

The future of microprocessors

[...]

Shekhar Borkar¹, Andrew A. Chien²•Institutions (2)

Intel¹, University of California, San Diego²

01 May 2011-Communications of The ACM

TL;DR: Energy efficiency is the new fundamental limiter of processor performance, way beyond numbers of processors.

...read moreread less

Abstract: Energy efficiency is the new fundamental limiter of processor performance, way beyond numbers of processors.

...read moreread less

920 citations

Journal Article•DOI•

The Alpha 21264 microprocessor

[...]

R.E. Kessler

01 Mar 1999-IEEE Micro

TL;DR: A unique combination of high clock speeds and advanced microarchitectural techniques, including many forms of out-of-order and speculative execution, provide exceptional core computational performance in the 21264.

...read moreread less

Abstract: Alpha microprocessors have been performance leaders since their introduction in 1992. The first generation 21064 and the later 21164 raised expectations for the newest generation-performance leadership was again a goal of the 21264 design team. Benchmark scores of 30+ SPECint95 and 58+ SPECfp95 offer convincing evidence thus far that the 21264 achieves this goal and will continue to set a high performance standard. A unique combination of high clock speeds and advanced microarchitectural techniques, including many forms of out-of-order and speculative execution, provide exceptional core computational performance in the 21264. The processor also features a high-bandwidth memory system that can quickly deliver data values to the execution core, providing robust performance for a wide range of applications, including those without cache locality. The advanced performance levels are attained while maintaining an installed application base. All Alpha generations are upward-compatible. Database, real-time visual computing, data mining, medical imaging, scientific/technical, and many other applications can utilize the outstanding performance available with the 21264.

...read moreread less

828 citations

Proceedings Article•DOI•

System level analysis of fast, per-core DVFS using on-chip switching regulators

[...]

Wonyoung Kim¹, Meeta S. Gupta¹, Gu-Yeon Wei¹, David Brooks¹•Institutions (1)

Harvard University¹

24 Oct 2008

TL;DR: It is concluded that on-chip regulators can significantly improve DVFS effectiveness and lead to overall system energy savings in a CMP, but architects must carefully account for overheads and costs when designing next-generation DVFS systems and algorithms.

...read moreread less

Abstract: Portable, embedded systems place ever-increasing demands on high-performance, low-power microprocessor design. Dynamic voltage and frequency scaling (DVFS) is a well-known technique to reduce energy in digital systems, but the effectiveness of DVFS is hampered by slow voltage transitions that occur on the order of tens of microseconds. In addition, the recent trend towards chip-multiprocessors (CMP) executing multi-threaded workloads with heterogeneous behavior motivates the need for per-core DVFS control mechanisms. Voltage regulators that are integrated onto the same chip as the microprocessor core provide the benefit of both nanosecond-scale voltage switching and per-core voltage control. We show that these characteristics provide significant energy-saving opportunities compared to traditional off-chip regulators. However, the implementation of on-chip regulators presents many challenges including regulator efficiency and output voltage transient characteristics, which are significantly impacted by the system-level application of the regulator. In this paper, we describe and model these costs, and perform a comprehensive analysis of a CMP system with on-chip integrated regulators. We conclude that on-chip regulators can significantly improve DVFS effectiveness and lead to overall system energy savings in a CMP, but architects must carefully account for overheads and costs when designing next-generation DVFS systems and algorithms.

...read moreread less

758 citations

Proceedings Article•DOI•

Selective cache ways: on-demand cache resource allocation

[...]

David H. Albonesi¹•Institutions (1)

University of Rochester¹

16 Nov 1999

TL;DR: In this paper, a tradeoff between performance and energy is made between a small performance degradation for energy savings, and the tradeoff can produce a significant reduction in cache energy dissipation.

...read moreread less

Abstract: Increasing levels of microprocessor power dissipation call for new approaches at the architectural level that save energy by better matching of on-chip resources to application requirements. Selective cache ways provides the ability to disable a subset of the ways in a set associative cache during periods of modest cache activity, while the full cache may remain operational for more cache-intensive periods. Because this approach leverages the subarray partitioning that is already present for performance reasons, only minor changes to a conventional cache are required, and therefore, full-speed cache operation can be maintained. Furthermore, the tradeoff between performance and energy is flexible, and can be dynamically tailored to meet changing application and machine environmental conditions. We show that trading off a small performance degradation for energy savings can produce a significant reduction in cache energy dissipation using this approach.

...read moreread less

733 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse