CACTI-P: architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques
Sheng Li,Ke Chen,Jung Ho Ahn,Jay B. Brockman,Norman P. Jouppi +4 more
- pp 694-701
Reads0
Chats0
TLDR
It is found that although nanosecond scale power-gating is a powerful way to minimize leakage power for all levels of caches, its severe impacts on processor performance and energy when being used for L1 data caches make nanose Cond scalePower-Gating a better fit for caches closer to main memory.Abstract:
This paper introduces CACTI-P, the first architecture-level integrated power, area, and timing modeling framework for SRAM-based structures with advanced leakage power reduction techniques. CACTI-P supports modeling of major leakage power reduction approaches including power-gating, long channel devices, and Hi-k metal gate devices. Because it accounts for implementation overheads, CACTI-P enables in-depth study of architecture-level tradeoffs for advanced leakage power management schemes. We illustrate the potential applicability of CACTI-P in the design and analysis of leakage power reduction techniques of future manycore processors by applying nanosecond scale power-gating to different levels of cache for a 64 core multithreaded architecture at the 22nm technology. Combining results from CACTI-P and a performance simulator, we find that although nanosecond scale power-gating is a powerful way to minimize leakage power for all levels of caches, its severe impacts on processor performance and energy when being used for L1 data caches make nanosecond scale power-gating a better fit for caches closer to main memory.read more
Citations
More filters
Proceedings ArticleDOI
TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory
TL;DR: The hardware architecture and software scheduling and partitioning techniques for TETRIS, a scalable NN accelerator using 3D memory, are presented and it is shown that despite the use of small SRAM buffers, the presence of3D memory simplifies dataflow scheduling for NN computations.
Proceedings ArticleDOI
Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks
Hardik Sharma,Jongse Park,Naveen Suda,Liangzhen Lai,Benson Chau,Vikas Chandra,Hadi Esmaeilzadeh +6 more
TL;DR: This work designs Bit Fusion, a bit-flexible accelerator that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers, and compares it to two state-of-the-art DNN accelerators, Eyeriss and Stripes.
Journal ArticleDOI
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks for manycore designs at the 22nm technology shows that 8-core clustering gives the best energy-delay product, whereas when die area is taken into account, 4-core clusters give the best EDA2P and EDAP.
Proceedings ArticleDOI
SnaPEA: predictive early activation for reducing computation in deep convolutional neural networks
TL;DR: This paper proposes a predictive early activation technique, dubbed SnaPEA, which offers up to 63% speedup and 49% energy reduction across the convolution layers with no loss in classification accuracy.
Proceedings ArticleDOI
Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs
TL;DR: Accelergy is presented, a generally applicable energy estimation methodology for accelerators that allows design specifications comprised of user- defined high-level compound components and user-defined low-level primitive components, which can be characterized by third-party energy estimation plug-ins.
References
More filters
Proceedings ArticleDOI
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Book
CMOS VLSI Design : A Circuits and Systems Perspective
Neil Weste,David Money Harris +1 more
TL;DR: The authors draw upon extensive industry and classroom experience to introduce todays most advanced and effective chip design practices, and present extensively updated coverage of every key element of VLSI design, and illuminate the latest design challenges with 65 nm process examples.
Journal ArticleDOI
Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits
TL;DR: Channel engineering techniques including retrograde well and halo doping are explained as means to manage short-channel effects for continuous scaling of CMOS devices and different circuit techniques to reduce the leakage power consumption are explored.
Journal ArticleDOI
Niagara: a 32-way multithreaded Sparc processor
TL;DR: The Niagara processor implements a thread-rich architecture designed to provide a high-performance solution for commercial server applications that exploits the thread-level parallelism inherent to server applications, while targeting low levels of power consumption.
Related Papers (5)
The gem5 simulator
In-Datacenter Performance Analysis of a Tensor Processing Unit
Norman P. Jouppi,Cliff Young,Nishant Patil,David A. Patterson,Gaurav Agrawal,Raminder Bajwa,Sarah Bates,Suresh Bhatia,Nan Boden,Albert T. Borchers,Rick Boyle,Pierre-luc Cantin,Clifford Chao,Christopher Aaron Clark,Jeremy Coriell,Michael J. Daley,Matt Dau,Jeffrey Dean,Ben Gelb,Tara Vazir Ghaemmaghami,Rajendra Gottipati,William John Gulland,Robert Hagmann,C. Richard Ho,Doug Hogberg,John Hu,Robert Hundt,D. Hurt,Julian Ibarz,Aaron Jaffey,Alek Jaworski,Alexander Kaplan,Khaitan Harshit,Daniel Killebrew,Andy Koch,Naveen Kumar,Steve Lacy,James Laudon,James Law,Diemthu Le,Chris Leary,Zhuyuan Liu,Kyle Lucke,Alan Lundin,Gordon MacKean,Adriana Maggiore,Maire Mahony,Kieran Miller,Rahul Nagarajan,Ravi Narayanaswami,Ray Ni,Kathy Nix,Thomas Norrie,Mark Omernick,Narayana Penukonda,Andrew Everett Phelps,Jonathan Ross,Matt Ross,Amir Salek,Emad Samadiani,Chris Severn,Gregory Sizikov,Matthew Snelham,Jed Souter,Dan Steinberg,Andy Swing,Mercedes Tan,Gregory Michael Thorson,Bo Tian,Horia Toma,Erick Tuttle,Vijay K. Vasudevan,Richard Walter,Walter Wang,Eric Wilcox,Doe Hyun Yoon +75 more