scispace - formally typeset
Search or ask a question
Author

Albert Ou

Bio: Albert Ou is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: System on a chip & Field-programmable gate array. The author has an hindex of 4, co-authored 11 publications receiving 925 citations.

Papers
More filters
Journal ArticleDOI
24 Dec 2015-Nature
TL;DR: This demonstration could represent the beginning of an era of chip-scale electronic–photonic systems with the potential to transform computing system architectures, enabling more powerful computers, from network infrastructure to data centres and supercomputers.
Abstract: An electronic–photonic microprocessor chip manufactured using a conventional microelectronics foundry process is demonstrated; the chip contains 70 million transistors and 850 photonic components and directly uses light to communicate to other chips. The rapid transfer of data between chips in computer systems and data centres has become one of the bottlenecks in modern information processing. One way of increasing speeds is to use optical connections rather than electrical wires and the past decade has seen significant efforts to develop silicon-based nanophotonic approaches to integrate such links within silicon chips, but incompatibility between the manufacturing processes used in electronics and photonics has proved a hindrance. Now Chen Sun et al. describe a 'system on a chip' microprocessor that successfully integrates electronics and photonics yet is produced using standard microelectronic chip fabrication techniques. The resulting microprocessor combines 70 million transistors and 850 photonic components and can communicate optically with the outside world. This result promises a way forward for new fast, low-power computing systems architectures. Data transport across short electrical wires is limited by both bandwidth and power density, which creates a performance bottleneck for semiconductor microchips in modern computer systems—from mobile phones to large-scale data centres. These limitations can be overcome1,2,3 by using optical communications based on chip-scale electronic–photonic systems4,5,6,7 enabled by silicon-based nanophotonic devices8. However, combining electronics and photonics on the same chip has proved challenging, owing to microchip manufacturing conflicts between electronics and photonics. Consequently, current electronic–photonic chips9,10,11 are limited to niche manufacturing processes and include only a few optical devices alongside simple circuits. Here we report an electronic–photonic system on a single chip integrating over 70 million transistors and 850 photonic components that work together to provide logic, memory, and interconnect functions. This system is a realization of a microprocessor that uses on-chip photonic devices to directly communicate with other chips using light. To integrate electronics and photonics at the scale of a microprocessor chip, we adopt a ‘zero-change’ approach to the integration of photonics. Instead of developing a custom process to enable the fabrication of photonics12, which would complicate or eliminate the possibility of integration with state-of-the-art transistors at large scale and at high yield, we design optical devices using a standard microelectronics foundry process that is used for modern microprocessors13,14,15,16. This demonstration could represent the beginning of an era of chip-scale electronic–photonic systems with the potential to transform computing system architectures, enabling more powerful computers, from network infrastructure to data centres and supercomputers.

1,058 citations

Journal ArticleDOI
TL;DR: The Chipyard framework is presented, an integrated SoC design, simulation, and implementation environment for specialized compute systems, that includes configurable, composable, open-source, generator-based IP blocks that can be used across multiple stages of the hardware development flow while maintaining design intent and integration consistency.
Abstract: Continued improvement in computing efficiency requires functional specialization of hardware designs. Agile hardware design methodologies have been proposed to alleviate the increased design costs of custom silicon architectures, but their practice thus far has been accompanied with challenges in integration and validation of complex systems-on-a-chip (SoCs). We present the Chipyard framework, an integrated SoC design, simulation, and implementation environment for specialized compute systems. Chipyard includes configurable, composable, open-source, generator-based IP blocks that can be used across multiple stages of the hardware development flow while maintaining design intent and integration consistency. Through cloud-hosted FPGA accelerated simulation and rapid ASIC implementation, Chipyard enables continuous validation of physically realizable customized systems.

114 citations

Posted Content
22 Nov 2019
TL;DR: Gemmini is presented -- an open source and agile systolic array generator enabling systematic evaluations of deep-learning architectures and achieves two to three orders of magnitude speedup in deep neural network inference compared to the baseline execution on a host processor.
Abstract: Advances in deep learning and neural networks have resulted in the rapid development of hardware accelerators that support them. A large majority of ASIC accelerators, however, target a single hardware design point to accelerate the main computational kernels of deep neural networks such as convolutions or matrix multiplication. On the other hand, the spectrum of use-cases for neural network accelerators, ranging from edge devices to cloud, presents a prime opportunity for agile hardware design and generator methodologies. We present Gemmini -- an open source and agile systolic array generator enabling systematic evaluations of deep-learning architectures. Gemmini generates a custom ASIC accelerator for matrix multiplication based on a systolic array architecture, complete with additional functions for neural network inference. Gemmini runs with the RISC-V ISA, and is integrated with the Rocket Chip System-on-Chip generator ecosystem, including Rocket in-order cores and BOOM out-of-order cores. Through an elaborate design space exploration case study, this work demonstrates the selection processes of various parameters for the use-case of inference on edge devices. Selected design points achieve two to three orders of magnitude speedup in deep neural network inference compared to the baseline execution on a host processor. Gemmini-generated accelerators were used in the fabrication of test systems-on-chip in TSMC 16nm and Intel 22FFL process technologies.

45 citations

Proceedings ArticleDOI
05 Dec 2021
TL;DR: Gemmini as discussed by the authors is an open-source, full-stack DNN accelerator generator that generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects.
Abstract: DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of Systemon-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source, full-stack DNN accelerator generator. Gemmini generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects. Gemmini-generated accelerators have also been fabricated, delivering up to three orders-of-magnitude speedups over high-performance CPUs on various DNN benchmarks.

35 citations

Proceedings ArticleDOI
09 Mar 2020
TL;DR: This work enables agile full-system performance optimization for hardware/software systems with FirePerf, a set of novel out-of-band system-level performance profiling capabilities integrated into the open-source FireSim FPGA-accelerated hardware simulation platform.
Abstract: Achieving high-performance when developing specialized hardware/software systems requires understanding and improving not only core compute kernels, but also intricate and elusive system-level bottlenecks. Profiling these bottlenecks requires both high-fidelity introspection and the ability to run sufficiently many cycles to execute complex software stacks, a challenging combination. In this work, we enable agile full-system performance optimization for hardware/software systems with FirePerf, a set of novel out-of-band system-level performance profiling capabilities integrated into the open-source FireSim FPGA-accelerated hardware simulation platform. Using out-of-band call stack reconstruction and automatic performance counter insertion, FirePerf enables introspecting into hardware and software at appropriate abstraction levels to rapidly identify opportunities for software optimization and hardware specialization, without disrupting end-to-end system behavior like traditional profiling tools. We demonstrate the capabilities of FirePerf with a case study that optimizes the hardware/software stack of an open-source RISC-V SoC with an Ethernet NIC to achieve 8x end-to-end improvement in achievable bandwidth for networking applications running on Linux. We also deploy a RISC-V Linux kernel optimization discovered with FirePerf on commercial RISC-V silicon, resulting in up to 1.72x improvement in network performance.

17 citations


Cited by
More filters
Journal ArticleDOI
01 Jul 2017
TL;DR: A new architecture for a fully optical neural network is demonstrated that enables a computational speed enhancement of at least two orders of magnitude and three order of magnitude in power efficiency over state-of-the-art electronics.
Abstract: Artificial Neural Networks have dramatically improved performance for many machine learning tasks. We demonstrate a new architecture for a fully optical neural network that enables a computational speed enhancement of at least two orders of magnitude and three orders of magnitude in power efficiency over state-of-the-art electronics.

1,955 citations

Journal ArticleDOI
24 Sep 2018-Nature
TL;DR: Monolithically integrated lithium niobate electro-optic modulators that feature a CMOS-compatible drive voltage, support data rates up to 210 gigabits per second and show an on-chip optical loss of less than 0.5 decibels are demonstrated.
Abstract: Electro-optic modulators translate high-speed electronic signals into the optical domain and are critical components in modern telecommunication networks1,2 and microwave-photonic systems3,4. They are also expected to be building blocks for emerging applications such as quantum photonics5,6 and non-reciprocal optics7,8. All of these applications require chip-scale electro-optic modulators that operate at voltages compatible with complementary metal–oxide–semiconductor (CMOS) technology, have ultra-high electro-optic bandwidths and feature very low optical losses. Integrated modulator platforms based on materials such as silicon, indium phosphide or polymers have not yet been able to meet these requirements simultaneously because of the intrinsic limitations of the materials used. On the other hand, lithium niobate electro-optic modulators, the workhorse of the optoelectronic industry for decades9, have been challenging to integrate on-chip because of difficulties in microstructuring lithium niobate. The current generation of lithium niobate modulators are bulky, expensive, limited in bandwidth and require high drive voltages, and thus are unable to reach the full potential of the material. Here we overcome these limitations and demonstrate monolithically integrated lithium niobate electro-optic modulators that feature a CMOS-compatible drive voltage, support data rates up to 210 gigabits per second and show an on-chip optical loss of less than 0.5 decibels. We achieve this by engineering the microwave and photonic circuits to achieve high electro-optical efficiencies, ultra-low optical losses and group-velocity matching simultaneously. Our scalable modulator devices could provide cost-effective, low-power and ultra-high-speed solutions for next-generation optical communication networks and microwave photonic systems. Furthermore, our approach could lead to large-scale ultra-low-loss photonic circuits that are reconfigurable on a picosecond timescale, enabling a wide range of quantum and classical applications5,10,11 including feed-forward photonic quantum computation. Chip-scale lithium niobate electro-optic modulators that rapidly convert electrical to optical signals and use CMOS-compatible voltages could prove useful in optical communication networks, microwave photonic systems and photonic computation.

1,358 citations

Journal ArticleDOI
08 May 2019-Nature
TL;DR: An optical version of a brain-inspired neurosynaptic system, using wavelength division multiplexing techniques, is presented that is capable of supervised and unsupervised learning.
Abstract: Software implementations of brain-inspired computing underlie many important computational tasks, from image processing to speech recognition, artificial intelligence and deep learning applications. Yet, unlike real neural tissue, traditional computing architectures physically separate the core computing functions of memory and processing, making fast, efficient and low-energy computing difficult to achieve. To overcome such limitations, an attractive alternative is to design hardware that mimics neurons and synapses. Such hardware, when connected in networks or neuromorphic systems, processes information in a way more analogous to brains. Here we present an all-optical version of such a neurosynaptic system, capable of supervised and unsupervised learning. We exploit wavelength division multiplexing techniques to implement a scalable circuit architecture for photonic neural networks, successfully demonstrating pattern recognition directly in the optical domain. Such photonic neurosynaptic networks promise access to the high speed and high bandwidth inherent to optical systems, thus enabling the direct processing of optical telecommunication and visual data. An optical version of a brain-inspired neurosynaptic system, using wavelength division multiplexing techniques, is presented that is capable of supervised and unsupervised learning.

862 citations

Journal ArticleDOI
18 Apr 2018-Nature
TL;DR: A way of integrating photonics with silicon nanoelectronics is described, using polycrystalline silicon on glass islands alongside transistors on bulk silicon complementary metal–oxide–semiconductor chips to address the demand for high-bandwidth optical interconnects in data centres and high-performance computing.
Abstract: Electronic and photonic technologies have transformed our lives-from computing and mobile devices, to information technology and the internet. Our future demands in these fields require innovation in each technology separately, but also depend on our ability to harness their complementary physics through integrated solutions1,2. This goal is hindered by the fact that most silicon nanotechnologies-which enable our processors, computer memory, communications chips and image sensors-rely on bulk silicon substrates, a cost-effective solution with an abundant supply chain, but with substantial limitations for the integration of photonic functions. Here we introduce photonics into bulk silicon complementary metal-oxide-semiconductor (CMOS) chips using a layer of polycrystalline silicon deposited on silicon oxide (glass) islands fabricated alongside transistors. We use this single deposited layer to realize optical waveguides and resonators, high-speed optical modulators and sensitive avalanche photodetectors. We integrated this photonic platform with a 65-nanometre-transistor bulk CMOS process technology inside a 300-millimetre-diameter-wafer microelectronics foundry. We then implemented integrated high-speed optical transceivers in this platform that operate at ten gigabits per second, composed of millions of transistors, and arrayed on a single optical bus for wavelength division multiplexing, to address the demand for high-bandwidth optical interconnects in data centres and high-performance computing3,4. By decoupling the formation of photonic devices from that of transistors, this integration approach can achieve many of the goals of multi-chip solutions 5 , but with the performance, complexity and scalability of 'systems on a chip'1,6-8. As transistors smaller than ten nanometres across become commercially available 9 , and as new nanotechnologies emerge10,11, this approach could provide a way to integrate photonics with state-of-the-art nanoelectronics.

630 citations