Y
Yubin Li
Researcher at Tsinghua University
Publications - 7
Citations - 935
Yubin Li is an academic researcher from Tsinghua University. The author has contributed to research in topics: Speedup & Hardware acceleration. The author has an hindex of 4, co-authored 7 publications receiving 790 citations.
Papers
More filters
Proceedings ArticleDOI
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
Song Han,Junlong Kang,Huizi Mao,Yiming Hu,Xin Li,Yubin Li,Dongliang Xie,Hong Luo,Song Yao,Yu Wang,Huazhong Yang,William J. Dally +11 more
TL;DR: The Efficient Speech Recognition Engine (ESE) as discussed by the authors proposes a load-balance-aware pruning method that can compress the LSTM model size by 20x (10x from pruning and 2x from quantization).
Posted Content
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
Song Han,Junlong Kang,Huizi Mao,Yiming Hu,Xin Li,Yubin Li,Dongliang Xie,Hong Luo,Song Yao,Yu Wang,Huazhong Yang,William J. Dally +11 more
TL;DR: This work proposes a load-balance-aware pruning method that can compress the LSTM model size by 20x (10x from pruning and 2x from quantization) with negligible loss of the prediction accuracy, and proposes a scheduler that encodes and partitions the compressed model to multiple PEs for parallelism and schedule the complicated L STM data flow.
Posted Content
ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA
Song Han,Junlong Kang,Huizi Mao,Yiming Hu,Xin Li,Yubin Li,Dongliang Xie,Hong Luo,Song Yao,Yu Wang,Huazhong Yang,William J. Dally +11 more
TL;DR: This work proposes a load-balance-aware pruning method that can compress the LSTM model size by 20× (10× from pruning and 2× from quantization) with negligible loss of the prediction accuracy, and designs the hardware architecture, named Efficient Speech Recognition Engine (ESE) that works directly on the compressed model.
Patent
Efficient data access control device for neural network hardware acceleration system
TL;DR: In this article, the authors propose an overall design of a device that can process data receiving, bitwidth transformation and data storing, by employing the technical disclosure, neural network hardware acceleration system can avoid the data access process becomes the bottleneck in neural network computation.
Proceedings ArticleDOI
Streaming sorting network based BWT acceleration on FPGA for lossless compression
TL;DR: A novel BWT accelerator based on the streaming sorting network that achieves 14.3X speedup compared with the state-of-art work when the data block size is 4KB and a lossless data compression system based on this accelerator.