CACTI-P: architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques

doi:10.5555/2132325.2132479

Open AccessProceedings ArticleDOI

CACTI-P: architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques

Sheng Li, +4 more

- pp 694-701

Chats0

TLDR

It is found that although nanosecond scale power-gating is a powerful way to minimize leakage power for all levels of caches, its severe impacts on processor performance and energy when being used for L1 data caches make nanose Cond scalePower-Gating a better fit for caches closer to main memory.

Abstract:

This paper introduces CACTI-P, the first architecture-level integrated power, area, and timing modeling framework for SRAM-based structures with advanced leakage power reduction techniques. CACTI-P supports modeling of major leakage power reduction approaches including power-gating, long channel devices, and Hi-k metal gate devices. Because it accounts for implementation overheads, CACTI-P enables in-depth study of architecture-level tradeoffs for advanced leakage power management schemes. We illustrate the potential applicability of CACTI-P in the design and analysis of leakage power reduction techniques of future manycore processors by applying nanosecond scale power-gating to different levels of cache for a 64 core multithreaded architecture at the 22nm technology. Combining results from CACTI-P and a performance simulator, we find that although nanosecond scale power-gating is a powerful way to minimize leakage power for all levels of caches, its severe impacts on processor performance and energy when being used for L1 data caches make nanosecond scale power-gating a better fit for caches closer to main memory.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

Mingyu Gao, +4 more

TL;DR: The hardware architecture and software scheduling and partitioning techniques for TETRIS, a scalable NN accelerator using 3D memory, are presented and it is shown that despite the use of small SRAM buffers, the presence of3D memory simplifies dataflow scheduling for NN computations.

...read moreread less

Proceedings ArticleDOI

Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks

Hardik Sharma, +6 more

TL;DR: This work designs Bit Fusion, a bit-flexible accelerator that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers, and compares it to two state-of-the-art DNN accelerators, Eyeriss and Stripes.

...read moreread less

Journal ArticleDOI

The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

Sheng Li, +5 more

- 01 Apr 2013 -

ACM Transactions on Architecture and Cod...

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks for manycore designs at the 22nm technology shows that 8-core clustering gives the best energy-delay product, whereas when die area is taken into account, 4-core clusters give the best EDA2P and EDAP.

...read moreread less

Proceedings ArticleDOI

SnaPEA: predictive early activation for reducing computation in deep convolutional neural networks

Vahideh Akhlaghi, +4 more

TL;DR: This paper proposes a predictive early activation technique, dubbed SnaPEA, which offers up to 63% speedup and 49% energy reduction across the convolution layers with no loss in classification accuracy.

...read moreread less

Proceedings ArticleDOI

Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs

Yannan Nellie Wu, +2 more

TL;DR: Accelergy is presented, a generally applicable energy estimation methodology for accelerators that allows design specifications comprised of user- defined high-level compound components and user-defined low-level primitive components, which can be characterized by third-party energy estimation plug-ins.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Sheng Li, +5 more

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.

...read moreread less

Book

CMOS VLSI Design : A Circuits and Systems Perspective

Neil Weste, +1 more

TL;DR: The authors draw upon extensive industry and classroom experience to introduce todays most advanced and effective chip design practices, and present extensively updated coverage of every key element of VLSI design, and illuminate the latest design challenges with 65 nm process examples.

...read moreread less

Journal ArticleDOI

Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits

Kaushik Roy, +2 more

TL;DR: Channel engineering techniques including retrograde well and halo doping are explained as means to manage short-channel effects for continuous scaling of CMOS devices and different circuit techniques to reduce the leakage power consumption are explored.

...read moreread less

Journal ArticleDOI

Niagara: a 32-way multithreaded Sparc processor

P. Kongetira, +2 more

- 01 Mar 2005 -

IEEE Micro

TL;DR: The Niagara processor implements a thread-rich architecture designed to provide a high-performance solution for commercial server applications that exploits the thread-level parallelism inherent to server applications, while targeting low levels of power consumption.

...read moreread less

An Enhanced Access and Cycle Time Model for On-Chip Caches

Steven J. E. Wilton, +1 more

CACTI-P: architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques

Citations

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks

The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

SnaPEA: predictive early activation for reducing computation in deep convolutional neural networks

Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs

References

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

CMOS VLSI Design : A Circuits and Systems Perspective

Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits

Niagara: a 32-way multithreaded Sparc processor

An Enhanced Access and Cycle Time Model for On-Chip Caches

Related Papers (5)

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

The gem5 simulator

EIE: efficient inference engine on compressed deep neural network

In-Datacenter Performance Analysis of a Tensor Processing Unit

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning