S
Shiwei Liu
Researcher at Fudan University
Publications - 5
Citations - 14
Shiwei Liu is an academic researcher from Fudan University. The author has contributed to research in topics: Clock rate & Convolutional neural network. The author has an hindex of 1, co-authored 3 publications receiving 4 citations.
Papers
More filters
Journal ArticleDOI
A Communication-Aware DNN Accelerator on ImageNet Using In-Memory Entry-Counting Based Algorithm-Circuit-Architecture Co-Design in 65-nm CMOS
Haozhe Zhu,Chixiao Chen,Shiwei Liu,Qiaosha Zou,Mingyu Wang,Lihua Zhang,Xiaoyang Zeng,C.-J. Richard Shi +7 more
TL;DR: This article presents a communication-aware processing-in-memory deep neural network accelerator, which implements an in-memory entry-counting scheme for low bit-width quantized multiplication-and-accumulations (MACs) to maintain good accuracy on ImageNet.
Proceedings ArticleDOI
Systolic-Array Deep-Learning Acceleration Exploring Pattern-Indexed Coordinate-Assisted Sparsity for Real-Time On-Device Speech Processing
TL;DR: In this paper, a hardware-software co-design for efficient sparse deep neural networks (DNNs) implementation in a regular systolic array for real-time on-device speech processing is presented.
A Scalable Die-to-Die Interconnect with Replay and Repair Schemes for 2.5D/3D Integration
Bo Jiao,Jinshan Zhang,Shiwei Liu,Hao Jiang,Jun Tao,Wenning Jiang,Qi Liu,Haozhe Zhu,Chixiao Chen +8 more
TL;DR: In this article , a scalable D2D interconnect with replay and repair schemes is presented for high efficiency, which can be configured down to power consumption as low as 0.55pJ/bit and 38.40Gbps throughput.
Proceedings ArticleDOI
XNORAM: An Efficient Computing-in-Memory Architecture for Binary Convolutional Neural Networks with Flexible Dataflow Mapping
TL;DR: An energy-efficient computing-inmemory architecture for binary convolutional neural networks, called XNORAM, is proposed which achieves a throughput of 18.86 TOPS/W and 4.63 GOPS/KB utilization with only 1.3% accuracy loss comparing to the original XNOR-Net result on GPUs.
Proceedings ArticleDOI
A 200M-Query-Vector/s Computing-in-RRAM ADC-less k-Nearest-Neighbor Accelerator with Time-Domain Winner-Takes-All Circuits
TL;DR: This paper proposed a computing-in-RRAM ADC-less k-nearest-neighbor accelerator with time-domain winner-takes-all circuits, which performs up to 200 million query vectors per second while consuming 0.75 mW, demonstrating 24.5 × energy performance improvement over prior works.