Home
/
Authors
/
Albert Ou

Author

Albert Ou

Bio: Albert Ou is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: System on a chip & Field-programmable gate array. The author has an hindex of 4, co-authored 11 publications receiving 925 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Single-chip microprocessor that communicates directly using light

[...]

Chen Sun¹, Chen Sun², Mark T. Wade³, Yunsup Lee¹, Jason S. Orcutt², Jason S. Orcutt⁴, Luca Alloatti², Michael Georgas², Andrew Waterman¹, Jeffrey M. Shainline³, Jeffrey M. Shainline⁴, Rimas Avizienis¹, Sen Lin¹, Benjamin Moss², Rajesh Kumar³, Fabio Pavanello³, Amir H. Atabaki², Henry Cook¹, Albert Ou¹, Jonathan Leu², Yu-Hsin Chen², Krste Asanovic¹, Rajeev J. Ram², Milos A. Popovic³, Vladimir Stojanovic¹ - Show less +21 more•Institutions (4)

University of California, Berkeley¹, Massachusetts Institute of Technology², University of Colorado Boulder³, National Institute of Standards and Technology⁴

24 Dec 2015-Nature

TL;DR: This demonstration could represent the beginning of an era of chip-scale electronic–photonic systems with the potential to transform computing system architectures, enabling more powerful computers, from network infrastructure to data centres and supercomputers.

...read moreread less

Abstract: An electronic–photonic microprocessor chip manufactured using a conventional microelectronics foundry process is demonstrated; the chip contains 70 million transistors and 850 photonic components and directly uses light to communicate to other chips. The rapid transfer of data between chips in computer systems and data centres has become one of the bottlenecks in modern information processing. One way of increasing speeds is to use optical connections rather than electrical wires and the past decade has seen significant efforts to develop silicon-based nanophotonic approaches to integrate such links within silicon chips, but incompatibility between the manufacturing processes used in electronics and photonics has proved a hindrance. Now Chen Sun et al. describe a 'system on a chip' microprocessor that successfully integrates electronics and photonics yet is produced using standard microelectronic chip fabrication techniques. The resulting microprocessor combines 70 million transistors and 850 photonic components and can communicate optically with the outside world. This result promises a way forward for new fast, low-power computing systems architectures. Data transport across short electrical wires is limited by both bandwidth and power density, which creates a performance bottleneck for semiconductor microchips in modern computer systems—from mobile phones to large-scale data centres. These limitations can be overcome1,2,3 by using optical communications based on chip-scale electronic–photonic systems4,5,6,7 enabled by silicon-based nanophotonic devices8. However, combining electronics and photonics on the same chip has proved challenging, owing to microchip manufacturing conflicts between electronics and photonics. Consequently, current electronic–photonic chips9,10,11 are limited to niche manufacturing processes and include only a few optical devices alongside simple circuits. Here we report an electronic–photonic system on a single chip integrating over 70 million transistors and 850 photonic components that work together to provide logic, memory, and interconnect functions. This system is a realization of a microprocessor that uses on-chip photonic devices to directly communicate with other chips using light. To integrate electronics and photonics at the scale of a microprocessor chip, we adopt a ‘zero-change’ approach to the integration of photonics. Instead of developing a custom process to enable the fabrication of photonics12, which would complicate or eliminate the possibility of integration with state-of-the-art transistors at large scale and at high yield, we design optical devices using a standard microelectronics foundry process that is used for modern microprocessors13,14,15,16. This demonstration could represent the beginning of an era of chip-scale electronic–photonic systems with the potential to transform computing system architectures, enabling more powerful computers, from network infrastructure to data centres and supercomputers.

...read moreread less

1,058 citations

Journal Article•DOI•

Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs

[...]

Alon Amid¹, David Biancolin¹, Abraham Gonzalez¹, Daniel Grubb¹, Sagar Karandikar¹, Harrison Liew¹, Albert Magyar¹, Howard Mao¹, Albert Ou¹, Nathan Pemberton¹, Paul Rigge¹, Colin Schmidt¹, John Wright¹, Jerry Zhao¹, Yakun Sophia Shao¹, Krste Asanovic¹, Borivoje Nikolic¹ - Show less +13 more•Institutions (1)

University of California, Berkeley¹

01 Jul 2020-IEEE Micro

TL;DR: The Chipyard framework is presented, an integrated SoC design, simulation, and implementation environment for specialized compute systems, that includes configurable, composable, open-source, generator-based IP blocks that can be used across multiple stages of the hardware development flow while maintaining design intent and integration consistency.

...read moreread less

Abstract: Continued improvement in computing efficiency requires functional specialization of hardware designs. Agile hardware design methodologies have been proposed to alleviate the increased design costs of custom silicon architectures, but their practice thus far has been accompanied with challenges in integration and validation of complex systems-on-a-chip (SoCs). We present the Chipyard framework, an integrated SoC design, simulation, and implementation environment for specialized compute systems. Chipyard includes configurable, composable, open-source, generator-based IP blocks that can be used across multiple stages of the hardware development flow while maintaining design intent and integration consistency. Through cloud-hosted FPGA accelerated simulation and rapid ASIC implementation, Chipyard enables continuous validation of physically realizable customized systems.

...read moreread less

114 citations

Posted Content•

Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures

[...]

Hasan Genc, Ameer Haj-Ali, Vighnesh Iyer, Alon Amid, Howard Mao, John Wright, Colin Schmidt, Jerry Zhao, Albert Ou, Max Banister, Yakun Sophia Shao, Borivoje Nikolic, Ion Stoica, Krste Asanovic - Show less +10 more

22 Nov 2019

TL;DR: Gemmini is presented -- an open source and agile systolic array generator enabling systematic evaluations of deep-learning architectures and achieves two to three orders of magnitude speedup in deep neural network inference compared to the baseline execution on a host processor.

...read moreread less

Abstract: Advances in deep learning and neural networks have resulted in the rapid development of hardware accelerators that support them. A large majority of ASIC accelerators, however, target a single hardware design point to accelerate the main computational kernels of deep neural networks such as convolutions or matrix multiplication. On the other hand, the spectrum of use-cases for neural network accelerators, ranging from edge devices to cloud, presents a prime opportunity for agile hardware design and generator methodologies. We present Gemmini -- an open source and agile systolic array generator enabling systematic evaluations of deep-learning architectures. Gemmini generates a custom ASIC accelerator for matrix multiplication based on a systolic array architecture, complete with additional functions for neural network inference. Gemmini runs with the RISC-V ISA, and is integrated with the Rocket Chip System-on-Chip generator ecosystem, including Rocket in-order cores and BOOM out-of-order cores. Through an elaborate design space exploration case study, this work demonstrates the selection processes of various parameters for the use-case of inference on edge devices. Selected design points achieve two to three orders of magnitude speedup in deep neural network inference compared to the baseline execution on a host processor. Gemmini-generated accelerators were used in the fabrication of test systems-on-chip in TSMC 16nm and Intel 22FFL process technologies.

...read moreread less

45 citations

Proceedings Article•DOI•

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration

[...]

Hasan Genc¹, Seah Kim¹, Alon Amid¹, Ameer Haj-Ali¹, Vighnesh Iyer¹, Pranav Prakash¹, Jerry Zhao¹, Daniel Grubb¹, Harrison Liew¹, Howard Mao¹, Albert Ou¹, Colin Schmidt¹, Samuel Steffl¹, John Wright¹, Ion Stoica¹, Jonathan Ragan-Kelley², Krste Asanovic¹, Borivoje Nikolic¹, Yakun Sophia Shao¹ - Show less +15 more•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

05 Dec 2021

TL;DR: Gemmini as discussed by the authors is an open-source, full-stack DNN accelerator generator that generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects.

...read moreread less

Abstract: DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of Systemon-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source, full-stack DNN accelerator generator. Gemmini generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects. Gemmini-generated accelerators have also been fabricated, delivering up to three orders-of-magnitude speedups over high-performance CPUs on various DNN benchmarks.

...read moreread less

35 citations

Proceedings Article•DOI•

FirePerf: FPGA-Accelerated Full-System Hardware/Software Performance Profiling and Co-Design

[...]

Sagar Karandikar¹, Albert Ou¹, Alon Amid¹, Howard Mao¹, Randy H. Katz¹, Borivoje Nikolic¹, Krste Asanovic¹ - Show less +3 more•Institutions (1)

University of California, Berkeley¹

09 Mar 2020

TL;DR: This work enables agile full-system performance optimization for hardware/software systems with FirePerf, a set of novel out-of-band system-level performance profiling capabilities integrated into the open-source FireSim FPGA-accelerated hardware simulation platform.

...read moreread less

Abstract: Achieving high-performance when developing specialized hardware/software systems requires understanding and improving not only core compute kernels, but also intricate and elusive system-level bottlenecks. Profiling these bottlenecks requires both high-fidelity introspection and the ability to run sufficiently many cycles to execute complex software stacks, a challenging combination. In this work, we enable agile full-system performance optimization for hardware/software systems with FirePerf, a set of novel out-of-band system-level performance profiling capabilities integrated into the open-source FireSim FPGA-accelerated hardware simulation platform. Using out-of-band call stack reconstruction and automatic performance counter insertion, FirePerf enables introspecting into hardware and software at appropriate abstraction levels to rapidly identify opportunities for software optimization and hardware specialization, without disrupting end-to-end system behavior like traditional profiling tools. We demonstrate the capabilities of FirePerf with a case study that optimizes the hardware/software stack of an open-source RISC-V SoC with an Ethernet NIC to achieve 8x end-to-end improvement in achievable bandwidth for networking applications running on Linux. We also deploy a RISC-V Linux kernel optimization discovered with FirePerf on commercial RISC-V silicon, resulting in up to 1.72x improvement in network performance.

...read moreread less

17 citations

Cited by

PDF

Open Access

More filters

Journal Article•

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

[...]

성기익, 이영탁, 박계현, 전태국, 박표원, 한일용, 장윤희 - Show less +3 more

01 Mar 2003-The Korean Journal of Thoracic and Cardiovascular Surgery

28,685 citations

Journal Article•DOI•

Deep learning with coherent nanophotonic circuits

[...]

Yichen Shen¹, Nicholas C. Harris¹, Scott Skirlo¹, Dirk Englund¹, Marin Soljacic¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Jul 2017

TL;DR: A new architecture for a fully optical neural network is demonstrated that enables a computational speed enhancement of at least two orders of magnitude and three order of magnitude in power efficiency over state-of-the-art electronics.

...read moreread less

Abstract: Artificial Neural Networks have dramatically improved performance for many machine learning tasks. We demonstrate a new architecture for a fully optical neural network that enables a computational speed enhancement of at least two orders of magnitude and three orders of magnitude in power efficiency over state-of-the-art electronics.

...read moreread less

1,955 citations

Journal Article•DOI•

Integrated lithium niobate electro-optic modulators operating at CMOS-compatible voltages

[...]

Cheng Wang¹, Cheng Wang², Mian Zhang¹, Xi Chen³, Maxime Bertrand¹, Maxime Bertrand⁴, Amirhassan Shams-Ansari⁵, Amirhassan Shams-Ansari¹, Sethumadhavan Chandrasekhar³, Peter J. Winzer³, Marko Loncar¹ - Show less +7 more•Institutions (5)

Harvard University¹, City University of Hong Kong², Bell Labs³, University of Bordeaux⁴, University of Washington⁵

24 Sep 2018-Nature

TL;DR: Monolithically integrated lithium niobate electro-optic modulators that feature a CMOS-compatible drive voltage, support data rates up to 210 gigabits per second and show an on-chip optical loss of less than 0.5 decibels are demonstrated.

...read moreread less

Abstract: Electro-optic modulators translate high-speed electronic signals into the optical domain and are critical components in modern telecommunication networks1,2 and microwave-photonic systems3,4. They are also expected to be building blocks for emerging applications such as quantum photonics5,6 and non-reciprocal optics7,8. All of these applications require chip-scale electro-optic modulators that operate at voltages compatible with complementary metal–oxide–semiconductor (CMOS) technology, have ultra-high electro-optic bandwidths and feature very low optical losses. Integrated modulator platforms based on materials such as silicon, indium phosphide or polymers have not yet been able to meet these requirements simultaneously because of the intrinsic limitations of the materials used. On the other hand, lithium niobate electro-optic modulators, the workhorse of the optoelectronic industry for decades9, have been challenging to integrate on-chip because of difficulties in microstructuring lithium niobate. The current generation of lithium niobate modulators are bulky, expensive, limited in bandwidth and require high drive voltages, and thus are unable to reach the full potential of the material. Here we overcome these limitations and demonstrate monolithically integrated lithium niobate electro-optic modulators that feature a CMOS-compatible drive voltage, support data rates up to 210 gigabits per second and show an on-chip optical loss of less than 0.5 decibels. We achieve this by engineering the microwave and photonic circuits to achieve high electro-optical efficiencies, ultra-low optical losses and group-velocity matching simultaneously. Our scalable modulator devices could provide cost-effective, low-power and ultra-high-speed solutions for next-generation optical communication networks and microwave photonic systems. Furthermore, our approach could lead to large-scale ultra-low-loss photonic circuits that are reconfigurable on a picosecond timescale, enabling a wide range of quantum and classical applications5,10,11 including feed-forward photonic quantum computation. Chip-scale lithium niobate electro-optic modulators that rapidly convert electrical to optical signals and use CMOS-compatible voltages could prove useful in optical communication networks, microwave photonic systems and photonic computation.

...read moreread less

1,358 citations

Journal Article•DOI•

All-optical spiking neurosynaptic networks with self-learning capabilities.

[...]

Johannes Feldmann¹, Nathan Youngblood², C.D. Wright³, Harish Bhaskaran², Wolfram H. P. Pernice¹ - Show less +1 more•Institutions (3)

University of Münster¹, University of Oxford², University of Exeter³

08 May 2019-Nature

TL;DR: An optical version of a brain-inspired neurosynaptic system, using wavelength division multiplexing techniques, is presented that is capable of supervised and unsupervised learning.

...read moreread less

Abstract: Software implementations of brain-inspired computing underlie many important computational tasks, from image processing to speech recognition, artificial intelligence and deep learning applications. Yet, unlike real neural tissue, traditional computing architectures physically separate the core computing functions of memory and processing, making fast, efficient and low-energy computing difficult to achieve. To overcome such limitations, an attractive alternative is to design hardware that mimics neurons and synapses. Such hardware, when connected in networks or neuromorphic systems, processes information in a way more analogous to brains. Here we present an all-optical version of such a neurosynaptic system, capable of supervised and unsupervised learning. We exploit wavelength division multiplexing techniques to implement a scalable circuit architecture for photonic neural networks, successfully demonstrating pattern recognition directly in the optical domain. Such photonic neurosynaptic networks promise access to the high speed and high bandwidth inherent to optical systems, thus enabling the direct processing of optical telecommunication and visual data. An optical version of a brain-inspired neurosynaptic system, using wavelength division multiplexing techniques, is presented that is capable of supervised and unsupervised learning.

...read moreread less

862 citations

Journal Article•DOI•

Integrating photonics with silicon nanoelectronics for the next generation of systems on a chip.

[...]

Amir H. Atabaki¹, Sajjad Moazeni², Fabio Pavanello³, Fabio Pavanello⁴, Hayk Gevorgyan⁵, Jelena Notaros⁴, Jelena Notaros¹, Luca Alloatti⁶, Luca Alloatti¹, Mark T. Wade⁴, Chen Sun², Seth Kruger⁷, Huaiyu Meng¹, Kenaish Al Qubaisi⁵, Imbert Wang⁵, Bohan Zhang⁵, Anatol Khilo⁵, Christopher Baiocco⁷, Milos A. Popovic⁵, Vladimir Stojanovic², Rajeev J. Ram¹ - Show less +17 more•Institutions (7)

Massachusetts Institute of Technology¹, University of California, Berkeley², Ghent University³, University of Colorado Boulder⁴, Boston University⁵, ETH Zurich⁶, State University of New York System⁷

18 Apr 2018-Nature

TL;DR: A way of integrating photonics with silicon nanoelectronics is described, using polycrystalline silicon on glass islands alongside transistors on bulk silicon complementary metal–oxide–semiconductor chips to address the demand for high-bandwidth optical interconnects in data centres and high-performance computing.

...read moreread less

Abstract: Electronic and photonic technologies have transformed our lives-from computing and mobile devices, to information technology and the internet. Our future demands in these fields require innovation in each technology separately, but also depend on our ability to harness their complementary physics through integrated solutions1,2. This goal is hindered by the fact that most silicon nanotechnologies-which enable our processors, computer memory, communications chips and image sensors-rely on bulk silicon substrates, a cost-effective solution with an abundant supply chain, but with substantial limitations for the integration of photonic functions. Here we introduce photonics into bulk silicon complementary metal-oxide-semiconductor (CMOS) chips using a layer of polycrystalline silicon deposited on silicon oxide (glass) islands fabricated alongside transistors. We use this single deposited layer to realize optical waveguides and resonators, high-speed optical modulators and sensitive avalanche photodetectors. We integrated this photonic platform with a 65-nanometre-transistor bulk CMOS process technology inside a 300-millimetre-diameter-wafer microelectronics foundry. We then implemented integrated high-speed optical transceivers in this platform that operate at ten gigabits per second, composed of millions of transistors, and arrayed on a single optical bus for wavelength division multiplexing, to address the demand for high-bandwidth optical interconnects in data centres and high-performance computing3,4. By decoupling the formation of photonic devices from that of transistors, this integration approach can achieve many of the goals of multi-chip solutions 5 , but with the performance, complexity and scalability of 'systems on a chip'1,6-8. As transistors smaller than ten nanometres across become commercially available 9 , and as new nanotechnologies emerge10,11, this approach could provide a way to integrate photonics with state-of-the-art nanoelectronics.

...read moreread less

630 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse