scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

State Preserving Dynamic DRAM Bank Re-Configurations for Enhanced Power Efficiency

TL;DR: This work proposes a novel mechanism via which it is possible to dynamically power down and power on banks while taking the bank utilization and overall performance into account and can observe a power savings of up to 12.31% while incurring an average performance loss over baseline executions.
Abstract: Power efficiency is one of the grand challenge problems facing computer architecture in recent years. Driven by the growth towards green computing, it is imperative to design architectures that can provide maximum power savings while incurring minimal overhead on real estate and performance. Towards this we propose a state-preserving mechanism for dynamically configuring DRAM banks based on utilization. We propose a novel mechanism via which it is possible to dynamically power down and power on banks while taking the bank utilization and overall performance into account. Over extensive experimentation using memory intensive applications, we can observe a power savings of up to 12.31% while incurring an average performance loss of 0.82% over baseline executions.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper has proposed an efficient cache resizing policy for large sized LLC, especially for DRAM-based LLCs called Efficient Cache Resizing (ECR) which is implemented on top of a 3D Tiled CMP.

3 citations

Proceedings ArticleDOI
06 Jul 2021
TL;DR: In this paper, a cache replacement policy that assigns priorities at two different levels: first at individual block level and other is at the subset level within a cache set is proposed, which identifies different types of blocks in a cache and expresses the priority of a subset using a weighted sum of these block counts.
Abstract: The overall performance of a computing system is governed by the fact that how efficiently the memory system is managed. The recent boom in the data-driven memory-intensive applications made this more prominent. As the off-chip memory accesses are costly, efficient management of the last level caches is the need of the hour. Cache replacement policy plays a vital role in deciding what data to keep in the cache and there still remains a large gap between the theoretical optimal and the replacement policies implemented in the modern processors. Moreover, the majority of the existing cache replacement policies are designed prioritizing only one factor at a time (such as hit count, dead block, etc.). This paper proposes a cache replacement policy that assigns priorities at two different levels: first at individual block level and other is at the subset level within a cache set. The policy identifies different types of blocks in a cache and expresses the priority of a subset using a weighted sum of these block counts. Extensive simulation on CMP$im platform using the PARSEC benchmark shows that the proposed policy achieves a reduction in miss rate up to 17% with respect to LRU and up to 9% with respect to a state-of-art replacement policy. The proposed replacement policy also shows an average IPC improvement of 5.3% and 4.8% with respect to LRU and the baseline policy respectively.

2 citations

Journal Article
TL;DR: This study model a variation-aware framework, called VAR-DRAM, targeted for modern-day DRAM devices that provides enhanced power management by taking variations into account and ensures faster execution of programs as it internally remaps data from variation affected cells to normal cells and also ensures data preservation.
Abstract: Dynamic Random Access Memory (DRAM) is the de-facto choice for main memory devices due to its cost-effectiveness. It offers a larger capacity and higher bandwidth compared to SRAM but is slower than the latter. With each passing generation, DRAMs are becoming denser. One of its side-effects is the deviation of nominal parameters: process, voltage, and temperature. DRAMs are often considered as the bottleneck of the system as it trades off performance with capacity. With such inherent limitations, further deviation from nominal specifications is undesired. In this paper, we investigate the impact of variations in conventional DRAM devices on the aspects of performance, reliability, and energy requirements. Based on this study, we model a variation-aware framework, called VAR-DRAM, targeted for modern-day DRAM devices. It provides enhanced power management by taking variations into account. VAR-DRAM ensures faster execution of programs as it internally remaps data from variation affected cells to normal cells and also ensures data preservation. On extensive experimentation, we find that VAR-DRAM achieves peak energy savings of up to 48.8% with an average of 29.54% on DDR4 memories while improving the access latency of the DRAM compared to a variation affected device by 7.4%.
References
More filters
Journal ArticleDOI
TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
Abstract: The gem5 simulation infrastructure is the merger of the best aspects of the M5 [4] and GEMS [9] simulators. M5 provides a highly configurable simulation framework, multiple ISAs, and diverse CPU models. GEMS complements these features with a detailed and exible memory system, including support for multiple cache coherence protocols and interconnect models. Currently, gem5 supports most commercial ISAs (ARM, ALPHA, MIPS, Power, SPARC, and x86), including booting Linux on three of them (ARM, ALPHA, and x86).The project is the result of the combined efforts of many academic and industrial institutions, including AMD, ARM, HP, MIPS, Princeton, MIT, and the Universities of Michigan, Texas, and Wisconsin. Over the past ten years, M5 and GEMS have been used in hundreds of publications and have been downloaded tens of thousands of times. The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.

4,039 citations


"State Preserving Dynamic DRAM Bank ..." refers methods in this paper

  • ...For evaluation purposes, we have used GEM5 [1] simulator with memory type as our modified DRAMSim2....

    [...]

  • ...The ISA used during the simulation was ALPHA on GEM5....

    [...]

  • ...used GEM5 [1] simulator with memory type as our modified DRAMSim2....

    [...]

Journal ArticleDOI
TL;DR: The process of validating DRAMSim2 timing against manufacturer Verilog models in an effort to prove the accuracy of simulation results is described.
Abstract: In this paper we present DRAMSim2, a cycle accurate memory system simulator. The goal of DRAMSim2 is to be an accurate and publicly available DDR2/3 memory system model which can be used in both full system and trace-based simulations. We describe the process of validating DRAMSim2 timing against manufacturer Verilog models in an effort to prove the accuracy of simulation results. We outline the combination of DRAMSim2 with a cycle-accurate x86 simulator that can be used to perform full system simulations. Finally, we discuss DRAMVis, a visualization tool that can be used to graph and compare the results of DRAMSim2 simulations.

860 citations


"State Preserving Dynamic DRAM Bank ..." refers methods in this paper

  • ...For simulating the proposed technique, we’ve used a modified version cycle accurate DRAM Simulator, DRAMSim2 [16]....

    [...]

  • ...Power statistics are recorded from the DRAMSim2 Memory Statistics which yields total average power, background energy, activation energy, burst energy and refresh energy values for each rank after each simulation epoch [16]....

    [...]

Proceedings ArticleDOI
01 Aug 2000
TL;DR: Results indicate that gated-Vdd together with a novel resizable cache architecture reduces energy-delay by 62% with minimal impact on performance.
Abstract: Deep-submicron CMOS designs have resulted in large leakage energy dissipation in microprocessors. While SRAM cells in on-chip cache memories always contribute to this leakage, there is a large variability in active cell usage both within and across applications. This paper explores an integrated architectural and circuit-level approach to reducing leakage energy dissipation in instruction caches. We propose, gated-V/sub dd/, a circuit-level technique to gate the supply voltage and reduce leakage in unused SRAM cells. Our results indicate that gated-V/sub dd/ together with a novel resizable cache architecture reduces energy-delay by 62% with minimal impact on performance.

731 citations


"State Preserving Dynamic DRAM Bank ..." refers background in this paper

  • ...The concept of power gating [15] to gate the supply voltage and reduce leakage in unused SRAM cells has been widely implemented on SRAM circuits [12]....

    [...]

Journal ArticleDOI
12 Nov 2000
TL;DR: This paper considers page allocation policies that can be employed by an informed operating system to complement the hardware power management strategies and makes a compelling case for a cooperative hardware/software approach for exploiting power-aware memory.
Abstract: One of the major challenges of post-PC computing is the need to reduce energy consumption, thereby extending the lifetime of the batteries that power these mobile devices. Memory is a particularly important target for efforts to improve energy efficiency. Memory technology is becoming available that offers power management features such as the ability to put individual chips in any one of several different power modes. In this paper we explore the interaction of page placement with static and dynamic hardware policies to exploit these emerging hardware features. In particular, we consider page allocation policies that can be employed by an informed operating system to complement the hardware power management strategies. We perform experiments using two complementary simulation environments: a trace-driven simulator with workload traces that are representative of mobile computing and an execution-driven simulator with a detailed processor/memory model and a more memory-intensive set of benchmarks (SPEC2000). Our results make a compelling case for a cooperative hardware/software approach for exploiting power-aware memory, with down to as little as 45% of the Energy• Delay for the best static policy and 1% to 20% of the Energy• Delay for a traditional full-power memory.

436 citations


"State Preserving Dynamic DRAM Bank ..." refers background in this paper

  • ...In [5], the authors showed how a DRAM chip can be transitioned into a lowpower mode....

    [...]

  • ...The idle behaviour of the ranks for prolonged time enabled researchers to exploit the fact [5, 11] and a concept of power downing of ranks [5] came into existence....

    [...]

  • ...Works before [2, 5] have shown that several threshold parameters can determine the idle times when a DRAM bank can be transitioned to a low power mode....

    [...]

Journal ArticleDOI
13 Mar 2010
TL;DR: A toolchain for automatically synthesizing c-cores from application source code is presented and it is demonstrated that they can significantly reduce energy and energy-delay for a wide range of applications, and patching can extend the useful lifetime of individual c-Cores to match that of conventional processors.
Abstract: Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and allowing more computations to execute under the same power budget. To pursue this goal, this paper introduces conservation cores. Conservation cores, or c-cores, are specialized processors that focus on reducing energy and energy-delay instead of increasing performance. This focus on energy makes c-cores an excellent match for many applications that would be poor candidates for hardware acceleration (e.g., irregular integer codes). We present a toolchain for automatically synthesizing c-cores from application source code and demonstrate that they can significantly reduce energy and energy-delay for a wide range of applications. The c-cores support patching, a form of targeted reconfigurability, that allows them to adapt to new versions of the software they target. Our results show that conservation cores can reduce energy consumption by up to 16.0x for functions and by up to 2.1x for whole applications, while patching can extend the useful lifetime of individual c-cores to match that of conventional processors.

363 citations