Home
/
Authors
/
Bevan M. Baas

Author

Bevan M. Baas

Other affiliations: Stanford University, Qualcomm Atheros

Bio: Bevan M. Baas is an academic researcher from University of California, Davis. The author has contributed to research in topics: Throughput (business) & Clock rate. The author has an hindex of 27, co-authored 85 publications receiving 2541 citations. Previous affiliations of Bevan M. Baas include Stanford University & Qualcomm Atheros.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
1999
1996

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A low-power, high-performance, 1024-point FFT processor

[...]

Bevan M. Baas¹•Institutions (1)

Stanford University¹

01 Mar 1999-IEEE Journal of Solid-state Circuits

TL;DR: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor, which has been fabricated in a standard 0.7 /spl mu/m CMOS process and is fully functional on first-pass silicon.

...read moreread less

Abstract: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor. The 460000-transistor design has been fabricated in a standard 0.7 /spl mu/m (L/sub poly/=0.6 /spl mu/m) CMOS process and is fully functional on first-pass silicon. At a supply voltage of 1.1 V, it calculates a 1024-point complex FFT in 330 /spl mu/s while consuming 9.5 mW, resulting in an adjusted energy efficiency more than 16 times greater than the previously most efficient known FFT processor. At 3.3 V, it operates at 173 MHz-which is a clock rate 2.6 times greater than the previously fastest rate.

...read moreread less

319 citations

Journal Article•DOI•

Scaling equations for the accurate prediction of CMOS device performance from 180 nm to 7 nm

[...]

Aaron Stillmaker¹, Aaron Stillmaker², Bevan M. Baas¹•Institutions (2)

University of California, Davis¹, California State University, Fresno²

01 Jun 2017-Integration

TL;DR: This work curve fit second and third-order polynomials to circuit delay, energy, and power dissipation results based on HSpice simulations utilizing the Predictive Technology Model (PTM) and International Technology Roadmap for Semiconductors (ITRS) models.

...read moreread less

211 citations

Journal Article•DOI•

A 167-Processor Computational Platform in 65 nm CMOS

[...]

Dean N. Truong¹, Wayne H. Cheng¹, Tinoosh Mohsenin¹, Zhiyi Yu², Anthony T. Jacobson³, Gouri Landge¹, M.J. Meeuwsen⁴, Christine Watnik¹, Anh T. Tran¹, Zhibin Xiao¹, E. Work¹, Jeremy Webb¹, Paul Mejia¹, Bevan M. Baas¹ - Show less +10 more•Institutions (4)

University of California, Davis¹, Fudan University², University of California, Berkeley³, Intel⁴

24 Mar 2009-IEEE Journal of Solid-state Circuits

TL;DR: A 167-processor computational platform consists of an array of simple programmable processors capable of per-processor dynamic supply voltage and clock frequency scaling, three algorithm-specific processors, and three 16 KB shared memories; and is implemented in 65 nm CMOS as discussed by the authors.

...read moreread less

Abstract: A 167-processor computational platform consists of an array of simple programmable processors capable of per-processor dynamic supply voltage and clock frequency scaling, three algorithm-specific processors, and three 16 KB shared memories; and is implemented in 65 nm CMOS. All processors and shared memories are clocked by local fully independent, dynamically haltable, digitally-programmable oscillators and are interconnected by a configurable circuit-switched network which supports long-distance communication. Programmable processors occupy 0.17nmm2 and operate at a maximum clock frequency of 1.2 GHz at 1.3 V. At 1.2 V, they operate at 1.07 GHz and consume 47.5nmW when 100% active, resulting in an energy dissipation of 44 pJ per operation. At 0.675 V, they operate at 66 MHz and consume 608nmuW when 100% active, resulting in a total energy dissipation of 9.2 pJ per ALU or MAC operation.

...read moreread less

203 citations

Proceedings Article•DOI•

Split-Row: A Reduced Complexity, High Throughput LDPC Decoder Architecture

[...]

Tinoosh Mohsenin¹, Bevan M. Baas¹•Institutions (1)

University of California, Davis¹

01 Oct 2006

TL;DR: The proposed split-row method makes column processing parallelism easier to exploit, doubles available row processor parallelism, and significantly simplifies row processors - which results in smaller area, higher speeds, and lower energy dissipation.

...read moreread less

Abstract: A reduced complexity LDPC decoding method is presented that dramatically reduces wire interconnect complexity, which is a major issue in LDPC decoders. The proposed split-row method makes column processing parallelism easier to exploit, doubles available row processor parallelism, and significantly simplifies row processors - which results in smaller area, higher speeds, and lower energy dissipation. Simulation results over an additive white Gaussian channel show that the error performance of high row-weight codes with split-row decoding is within 0.3-0.6 dB of the min-sum and sum-product decoding algorithms. A full parallel decoder for a (3,6) LDPC code with a code length of 1536 bits is implemented in a 0.18 mum CMOS technology twice: once using the split-row method, and once using the min-sum algorithm for comparison. The split-row decoder operates at 53 MHz and delivers a throughput of 5.4 Gbps with 15 decoding iterations per block. The split-row decoder is about 1.3 times smaller, has an average wire length 1.5 times shorter, and has a throughput 1.6 times higher than the min-sum decoder.

...read moreread less

111 citations

Proceedings Article•DOI•

An integrated 802.11a baseband and MAC processor

[...]

J. Thomson¹, Bevan M. Baas, E.M. Cooper, J.M. Gilbert, G. Hsieh, P. Husted, A. Lokanathan, J.S. Kuskin, D. McCracken, B. McFarland, Teresa H. Meng, D. Nakahira, Sam Ng, M. Rattehalli, Jeffrey L. Smith, Ramanan Subramanian, L. Than, Yi-Hsiu Wang, R. Yu, Xiaoru Zhang - Show less +16 more•Institutions (1)

Qualcomm Atheros¹

07 Aug 2002

TL;DR: The 0.25/spl mu/m CMOS mixed-signal baseband and MAC processor for the IEEE 802.11a WLAN standard in 0.8 mm/sup 2/ and contains 4.0M transistors in a 196-pin BGA package.

...read moreread less

Abstract: An 0.25 /spl mu/m CMOS mixed-signal baseband and MAC processor for the IEEE 802.11a WLAN standard in 0.25 /spl mu/m CMOS occupies 6.8/spl times/6.8 mm/sup 2/ and contains 4.0M transistors in a 196-pin BGA package. Power consumption for transmit and receive is 326 mW and 452 mW. Additional data rates up to 108 Mb/s are supported. The MAC is implemented using dedicated control and datapath logic, and includes registers that allow host software to configure and control its operation. This yields an overall design that is compact, power-efficient, and requires no off-chip RAM or program storage, yet is very flexible.

...read moreread less

107 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Collapse

Cited by

PDF

Open Access

More filters

DOI•

International Technology Roadmap for Semiconductors 2003の要求清浄度について－シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について－

[...]

飯田裕幸, 竹田菊男, 藤本武利

20 Sep 2004

1,387 citations

Journal Article•DOI•

Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives

[...]

Radu Marculescu¹, Umit Y. Ogras¹, Li-Shiuan Peh², Natalie Enright Jerger³, Yatin Hoskote⁴ - Show less +1 more•Institutions (4)

Carnegie Mellon University¹, Princeton University², University of Wisconsin-Madison³, Intel⁴

01 Jan 2009-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper provides a general description of NoC architectures and applications and enumerates several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation.

...read moreread less

Abstract: To alleviate the complex communication problems that arise as the number of on-chip components increases, network-on-chip (NoC) architectures have been recently proposed to replace global interconnects. In this paper, we first provide a general description of NoC architectures and applications. Then, we enumerate several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation. Motivation, problem description, proposed approaches, and open issues are discussed for each problem from system, microarchitecture, and circuit perspectives. Finally, we address the interactions among these research problems and put the NoC design process into perspective.

...read moreread less

733 citations

Book•

IEEE transactions on computer-aided design of integrated circuits and systems : a publication of the IEEE Circuits and Systems Society

[...]

Ieee Circuits

01 Jan 1982

729 citations

Journal Article•DOI•

A 5-GHz Mesh Interconnect for a Teraflops Processor

[...]

Y. Hoskote¹, Sriram R. Vangal¹, A. Singh¹, Nitin Borkar¹, S. Borkar¹ - Show less +1 more•Institutions (1)

Intel¹

01 Sep 2007-IEEE Micro

TL;DR: A multicore processor in 65-Nm technology with 80 single-precision, floatingpoint cores delivers performance in excess of a Teraflops while consuming less than 100 W.

...read moreread less

Abstract: A multicore processor in 65-Nm technology with 80 single-precision, floatingpoint cores delivers performance in excess of a Teraflops while consuming less than 100 W. A 2D on-die mesh interconnection network operating at 5 GHz provides the high-performance communication fabric to connect the cores. The network delivers a bisection bandwidth of 2.56 Terabits per second and a per hop fall-through latency of 1 nanosecond.

...read moreread less

658 citations

Journal Article•DOI•

An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS

[...]

Sriram R. Vangal¹, Jason Howard¹, Greg Ruhl¹, Saurabh Dighe¹, H. Wilson¹, James W. Tschanz¹, D. Finan¹, A. Singh¹, Tiju Jacob¹, Shailendra Jain¹, Vasantha Erraguntla¹, Clark Roberts¹, Yatin Hoskote¹, Nitin Borkar¹, Shekhar Borkar¹ - Show less +11 more•Institutions (1)

Intel¹

28 Jan 2008-IEEE Journal of Solid-state Circuits

TL;DR: In this paper, an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz.

...read moreread less

Abstract: This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.

...read moreread less

645 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse