S
Sheng Li
Researcher at Google
Publications - 74
Citations - 4927
Sheng Li is an academic researcher from Google. The author has contributed to research in topics: Cache & Interleaved memory. The author has an hindex of 20, co-authored 72 publications receiving 4176 citations. Previous affiliations of Sheng Li include Hewlett-Packard & University of Notre Dame.
Papers
More filters
Proceedings ArticleDOI
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Proceedings ArticleDOI
Kiln: closing the performance gap between systems with and without persistence support
TL;DR: Kiln is a persistent memory design that adopts a nonvolatile cache and aNonvolatile main memory to enable atomic in-place updates without logging or copy-on-write and can achieve 2× performance improvement compared with NVRAM-based persistent memory with write-ahead logging.
Proceedings ArticleDOI
CACTI-P: architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques
TL;DR: It is found that although nanosecond scale power-gating is a powerful way to minimize leakage power for all levels of caches, its severe impacts on processor performance and energy when being used for L1 data caches make nanose Cond scalePower-Gating a better fit for caches closer to main memory.
Journal ArticleDOI
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks for manycore designs at the 22nm technology shows that 8-core clustering gives the best energy-delay product, whereas when die area is taken into account, 4-core clusters give the best EDA2P and EDAP.
Proceedings ArticleDOI
CACTI-3DD: architecture-level modeling for 3D die-stacked DRAM main memory
TL;DR: CACTI-3DD is introduced, the first architecture-level integrated power, area, and timing modeling framework for 3D die-stacked off-chip DRAM main memory, and the results show that the 3D DRAM with re-architected DRAM dies achieves significant improvements in power and timing compared to the coarse-grained 3DDie-Stacked DRAM.