X
Xuehai Qian
Researcher at University of Southern California
Publications - 121
Citations - 4172
Xuehai Qian is an academic researcher from University of Southern California. The author has contributed to research in topics: Computer science & Speedup. The author has an hindex of 25, co-authored 107 publications receiving 2537 citations. Previous affiliations of Xuehai Qian include Rutgers University & Chinese Academy of Sciences.
Papers
More filters
Proceedings ArticleDOI
PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
TL;DR: PipeLayer is presented, a ReRAM-based PIM accelerator for CNNs that support both training and testing and proposes highly parallel design based on the notion of parallelism granularity and weight replication, which enables the highly pipelined execution of bothTraining and testing, without introducing the potential stalls in previous work.
Proceedings ArticleDOI
CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices
Caiwen Ding,Siyu Liao,Yanzhi Wang,Zhe Li,Ning Liu,Youwei Zhuo,Chao Wang,Xuehai Qian,Yu Bai,Geng Yuan,Xiaolong Ma,Yipeng Zhang,Jian Tang,Qinru Qiu,Xue Lin,Bo Yuan +15 more
TL;DR: The CirCNN architecture is proposed, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc) and FFT can be used as the key computing kernel which ensures universal and small-footprint implementations.
Proceedings ArticleDOI
GraphR: Accelerating Graph Processing Using ReRAM
TL;DR: GRAPHR as discussed by the authors is the first ReRAM-based graph processing accelerator, which is based on the principle of near-data processing and explores the opportunity of performing massive parallel analog operations with low hardware and energy cost.
Proceedings ArticleDOI
DudeTM: Building Durable Transactions with Decoupling for Persistent Memory
TL;DR: DUDETM is presented, a crash-consistent durable transaction system that avoids the drawbacks of both undo logging and redo logging and can be implemented with existing hardware TMs with minor hardware modifications, leading to a further 1.7times speedup.
Proceedings ArticleDOI
GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition
Mingxing Zhang,Youwei Zhuo,Chao Wang,Mingyu Gao,Yongwei Wu,Kang Chen,Christos Kozyrakis,Xuehai Qian +7 more
TL;DR: It is argued that a PIM-based graph processing system should take data organization as a first-order design consideration and proposed GraphP, a novel HMC-based software/hardware co-designed graphprocessing system that drastically reduces communication and energy consumption compared to TESSERACT.