scispace - formally typeset
Open AccessProceedings ArticleDOI

CACTI-P: architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques

Reads0
Chats0
TLDR
It is found that although nanosecond scale power-gating is a powerful way to minimize leakage power for all levels of caches, its severe impacts on processor performance and energy when being used for L1 data caches make nanose Cond scalePower-Gating a better fit for caches closer to main memory.
Abstract
This paper introduces CACTI-P, the first architecture-level integrated power, area, and timing modeling framework for SRAM-based structures with advanced leakage power reduction techniques. CACTI-P supports modeling of major leakage power reduction approaches including power-gating, long channel devices, and Hi-k metal gate devices. Because it accounts for implementation overheads, CACTI-P enables in-depth study of architecture-level tradeoffs for advanced leakage power management schemes. We illustrate the potential applicability of CACTI-P in the design and analysis of leakage power reduction techniques of future manycore processors by applying nanosecond scale power-gating to different levels of cache for a 64 core multithreaded architecture at the 22nm technology. Combining results from CACTI-P and a performance simulator, we find that although nanosecond scale power-gating is a powerful way to minimize leakage power for all levels of caches, its severe impacts on processor performance and energy when being used for L1 data caches make nanosecond scale power-gating a better fit for caches closer to main memory.

read more

Citations
More filters
Proceedings ArticleDOI

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

TL;DR: The hardware architecture and software scheduling and partitioning techniques for TETRIS, a scalable NN accelerator using 3D memory, are presented and it is shown that despite the use of small SRAM buffers, the presence of3D memory simplifies dataflow scheduling for NN computations.
Proceedings ArticleDOI

Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks

TL;DR: This work designs Bit Fusion, a bit-flexible accelerator that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers, and compares it to two state-of-the-art DNN accelerators, Eyeriss and Stripes.
Journal ArticleDOI

The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks for manycore designs at the 22nm technology shows that 8-core clustering gives the best energy-delay product, whereas when die area is taken into account, 4-core clusters give the best EDA2P and EDAP.
Proceedings ArticleDOI

SnaPEA: predictive early activation for reducing computation in deep convolutional neural networks

TL;DR: This paper proposes a predictive early activation technique, dubbed SnaPEA, which offers up to 63% speedup and 49% energy reduction across the convolution layers with no loss in classification accuracy.
Proceedings ArticleDOI

Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs

TL;DR: Accelergy is presented, a generally applicable energy estimation methodology for accelerators that allows design specifications comprised of user- defined high-level compound components and user-defined low-level primitive components, which can be characterized by third-party energy estimation plug-ins.
References
More filters
Proceedings ArticleDOI

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Book

CMOS VLSI Design : A Circuits and Systems Perspective

TL;DR: The authors draw upon extensive industry and classroom experience to introduce todays most advanced and effective chip design practices, and present extensively updated coverage of every key element of VLSI design, and illuminate the latest design challenges with 65 nm process examples.
Journal ArticleDOI

Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits

TL;DR: Channel engineering techniques including retrograde well and halo doping are explained as means to manage short-channel effects for continuous scaling of CMOS devices and different circuit techniques to reduce the leakage power consumption are explored.
Journal ArticleDOI

Niagara: a 32-way multithreaded Sparc processor

TL;DR: The Niagara processor implements a thread-rich architecture designed to provide a high-performance solution for commercial server applications that exploits the thread-level parallelism inherent to server applications, while targeting low levels of power consumption.
Related Papers (5)