S
Steve Dai
Researcher at Cornell University
Publications - 31
Citations - 734
Steve Dai is an academic researcher from Cornell University. The author has contributed to research in topics: High-level synthesis & Pipeline (computing). The author has an hindex of 12, co-authored 28 publications receiving 428 citations. Previous affiliations of Steve Dai include Nvidia & Stanford University.
Papers
More filters
Proceedings ArticleDOI
MAGNet: A Modular Accelerator Generator for Neural Networks
Rangharajan Venkatesan,Priyanka Raina,Yanqing Zhang,Brian Zimmer,William J. Dally,Joel Emer,Stephen W. Keckler,Brucek Khailany,Yakun Sophia Shao,Miaorong Wang,Jason Clemons,Steve Dai,Matthew Fojtik,Ben Keller,Alicia Klinefelter,Nathaniel Pinckney +15 more
TL;DR: MAGNet, a modular accelerator generator for neural networks, is proposed and an inference accelerator optimized for image classification application using three different neural networks—AlexNet, ResNet, and DriveNet is designed.
Proceedings ArticleDOI
Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs
Yuan Zhou,Udit Gupta,Steve Dai,Ritchie Zhao,Nitish Srivastava,Hanchen Jin,Joseph Featherston,Yi-Hsiang Lai,Gai Liu,Gustavo Angarita Velasquez,Wenping Wang,Zhiru Zhang +11 more
TL;DR: Rosetta is a realistic benchmark suite for software programmable FPGAs that can be useful for the HLS research community, but can also serve as a set of design tutorials for non-expert HLS users.
Proceedings ArticleDOI
Fast and Accurate Estimation of Quality of Results in High-Level Synthesis with Machine Learning
TL;DR: This work builds a large collection of C-to-FPGA results from a diverse set of realistic HLS applications and identifies relevant features from HLS reports for estimating post-implementation metrics, and trains and compares a number of promising machine learning models to effectively and efficiently bridge the accuracy gap.
Journal ArticleDOI
The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric: Fast Architectures and Design Methodologies for Fast Chips
Scott Davidson,Shaolin Xie,Christopher Torng,Khalid Al-Hawai,Austin Rovinski,Tutu Ajayi,Luis Vega,Chun Zhao,Ritchie Zhao,Steve Dai,Aporva Amarnath,Bandhav Veluri,Paul Gao,Anuj Rao,Gai Liu,Rajesh Gupta,Zhiru Zhang,Ronald G. Dreslinski,Christopher Batten,Michael Taylor +19 more
TL;DR: The Celerity 16-nm open-source SoC was implemented in nine months using an architectural trifecta to minimize development time: a general-purpose tier, a massively parallel tier comprised of a RISC-V tiled manycore array, and a specialization tier that uses high-level synthesis to create an algorithmic neural-network accelerator.
Proceedings ArticleDOI
ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests
TL;DR: ElasticFlow is proposed, a novel architectural synthesis approach capable of dynamically distributing inner loops to an array of loop processing units (LPUs) in a complexity-effective manner that demonstrates significant performance improvements over a widely used commercial HLS tool for Xilinx FPGAs.