M
Mingyu Gao
Researcher at Stanford University
Publications - 32
Citations - 1977
Mingyu Gao is an academic researcher from Stanford University. The author has contributed to research in topics: Computer science & Row. The author has an hindex of 12, co-authored 22 publications receiving 1220 citations. Previous affiliations of Mingyu Gao include Samsung & Tsinghua University.
Papers
More filters
Proceedings ArticleDOI
TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory
TL;DR: The hardware architecture and software scheduling and partitioning techniques for TETRIS, a scalable NN accelerator using 3D memory, are presented and it is shown that despite the use of small SRAM buffers, the presence of3D memory simplifies dataflow scheduling for NN computations.
Proceedings ArticleDOI
Practical Near-Data Processing for In-Memory Analytics Frameworks
TL;DR: This paper develops the hardware and software of an NDP architecture for in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks, and shows that it is critical to optimize software frameworks for spatial locality as it leads to 2.9x efficiency improvements for NDP.
Journal ArticleDOI
Energy-Efficient Abundant-Data Computing: The N3XT 1,000x
Mohamed M. Sabry Aly,Mingyu Gao,Gage Hills,Chi-Shuen Lee,Greg Pitner,Max M. Shulaker,Tony F. Wu,Mehdi Asheghi,Jeffrey Bokor,Franz Franchetti,Kenneth E. Goodson,Christos Kozyrakis,Igor L. Markov,Kunle Olukotun,Larry Pileggi,Eric Pop,Jan M. Rabaey,Christopher Ré,H.-S. Philip Wong,Subhasish Mitra +19 more
TL;DR: N3XT improves the energy efficiency of abundant-data applications 1,000-fold by using new logic and memory technologies, 3D integration with fine-grained connectivity, and new architectures for computation immersed in memory.
Proceedings ArticleDOI
HRL: Efficient and flexible reconfigurable logic for near-data processing
Mingyu Gao,Christos Kozyrakis +1 more
TL;DR: Heterogeneous Reconfigurable Logic (HRL), a reconfigurable array for NDP systems that improves on both FPGA and CGRA arrays, and achieves 92% of the peak performance of an NDP system based on custom accelerators for each application.
Proceedings ArticleDOI
GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition
Mingxing Zhang,Youwei Zhuo,Chao Wang,Mingyu Gao,Yongwei Wu,Kang Chen,Christos Kozyrakis,Xuehai Qian +7 more
TL;DR: It is argued that a PIM-based graph processing system should take data organization as a first-order design consideration and proposed GraphP, a novel HMC-based software/hardware co-designed graphprocessing system that drastically reduces communication and energy consumption compared to TESSERACT.