scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Write-variation aware alternatives to replace SRAM buffers with non-volatile buffers in on-chip interconnects

01 Nov 2019-Iet Computers and Digital Techniques (The Institution of Engineering and Technology)-Vol. 13, Iss: 6, pp 481-492
TL;DR: Proposed policies to reduce the leakage power consumption of NoC buffers by the use of non-volatile spin transfer torque random access memory (STT-RAM)-based buffers and improve lifetime by 3.2 times and 1093 times, respectively are presented.
Abstract: With the advancement in CMOS technology and multiple processors on the chip, communication across these cores is managed by a network-on-chip (NoC). Power and performance of these NoC interconnects have become a significant factor.The authors aim to reduce the leakage power consumption of NoC buffers by the use of non-volatile spin transfer torque random access memory (STT-RAM)-based buffers. STT-RAM technology has the advantages of high density and low leakage but suffers from low endurance. This low endurance has an impact on the lifetime of the router on the whole due to unwanted write-variations governed by virtual channel (VC) allocation policies. Here various VC allocation policies that help the uniform distribution of the writes across the buffers are proposed. Iso-capacity and iso-area-based alternatives to replace SRAM buffers with STT-RAM buffers are also presented. Pure STT-RAM buffers, however, impact the network latency. To mitigate this, a hybrid variant of the proposed policies which uses alternative VCs made of SRAM technology in the case of heavy network traffic is proposed. Experimental evaluation of full system simulation shows that proposed policies reduce the write variation by 99% and improve lifetime by 3.2 times and 1093 times, respectively. Also a 55.5% gain in the energy delay product is obtained.
Citations
More filters
Proceedings ArticleDOI
10 Aug 2020
TL;DR: A write reduction technique, which is based on dirty flits present in write-back data packets, which results in a significant decrease in total and dynamic network power consumption and shows remarkable improvement in the lifetime.
Abstract: In a multi-core system, communication across cores is managed by an on-chip interconnect called Network-on-Chip (NoC). The utilization of NoC results in limitations such as high communication delay and high network power consumption. The buffers of the NoC router consume a considerable amount of leakage power. This paper attempts to reduce leakage power consumption by using Non-Volatile Memory technology-based buffers. NVM technology has the advantage of higher density and low leakage but suffers from costly write operation, and weaker write endurance. These characteristics impact on the total network power consumption, network latency, and lifetime of the router as a whole.In this paper, we propose a write reduction technique, which is based on dirty flits present in write-back data packets. The method also suggests a dirty flit based Virtual Channel (VC) allocation technique that distributes writes in NVM technology-based VCs to improve the lifetime of NVM buffers.The experimental evaluation on the full system simulator shows that the proposed policy obtains a 53% reduction in write-back flits, which results in 27% lesser total network flit on average. All these results in a significant decrease in total and dynamic network power consumption. The policy also shows remarkable improvement in the lifetime.

2 citations


Cites background from "Write-variation aware alternatives ..."

  • ...[16, 17] presented wear-leveling techniques to remove unwanted write variation in NVM based buffers to improve the lifetime of NoC routers....

    [...]

Journal ArticleDOI
TL;DR: Keep the routers always powered ON to maintain constant connectivity and investigate various approaches to use a combination of SRAM and nonvolatile spin-transfer torque random access memory-based VCs in the routers, which yield significant energy savings while maintaining connectivity.
Abstract: In the era of dark silicon, several components on the chip [i.e., cores, memory, and network on chip (NoC)] need to be powered-off or run in low-power mode. This is mainly due to the increased leakage power consumption at smaller technology nodes. Other than the power consumed by cores and caches, power and performance of the interconnects is a significant factor as the communication network consumes a considerable share of the power budget. In particular, the buffers used at every port of the NoC router consume considerable dynamic as well as static power. To support dark silicon and save energy, a popular approach is to power off the routers and wake them up when needed. However, this affects the packet latency, and we need to observe the traffic through the nodes to decide turning the routers ON–OFF. In this article, we propose to keep the routers always powered ON to maintain constant connectivity and investigate various approaches. One proposal is to frequency scale the routers connected to powered OFF nodes, and the other proposals are to use a combination of SRAM and nonvolatile spin-transfer torque random access memory-based VCs in the routers. By managing which VCs to be active at a given time, we achieve energy savings. The proposals are evaluated by varying the percentage of dark nodes on the chip. The experimental results show that all proposals yield significant energy savings while maintaining connectivity.

1 citations


Cites background from "Write-variation aware alternatives ..."

  • ...Rani and Kapoor [44], [45] presented wear-leveling techniques to remove unwanted write variation in NVMbased buffers to improve the lifetime of NoC routers....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
Abstract: The gem5 simulation infrastructure is the merger of the best aspects of the M5 [4] and GEMS [9] simulators. M5 provides a highly configurable simulation framework, multiple ISAs, and diverse CPU models. GEMS complements these features with a detailed and exible memory system, including support for multiple cache coherence protocols and interconnect models. Currently, gem5 supports most commercial ISAs (ARM, ALPHA, MIPS, Power, SPARC, and x86), including booting Linux on three of them (ARM, ALPHA, and x86).The project is the result of the combined efforts of many academic and industrial institutions, including AMD, ARM, HP, MIPS, Princeton, MIT, and the Universities of Michigan, Texas, and Wisconsin. Over the past ten years, M5 and GEMS have been used in hundreds of publications and have been downloaded tens of thousands of times. The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.

4,039 citations

Journal ArticleDOI
John L. Henning1
TL;DR: On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006, which replaces CPU2000, and the SPEC CPU benchmarks are widely used in both industry and academia.
Abstract: On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006 [2], which replaces CPU2000. The SPEC CPU benchmarks are widely used in both industry and academia [3].

1,864 citations

Journal ArticleDOI
TL;DR: A comprehensive study that projects the speedup potential of future multicores and examines the underutilization of integration capacity-dark silicon-is timely and crucial.
Abstract: A key question for the microprocessor research and design community is whether scaling multicores will provide the performance and value needed to scale down many more technology generations. To provide a quantitative answer to this question, a comprehensive study that projects the speedup potential of future multicores and examines the underutilization of integration capacity-dark silicon-is timely and crucial.

1,556 citations

Journal ArticleDOI
TL;DR: NVSim is developed, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash and is expected to help boost architecture-level NVM-related studies.
Abstract: Various new nonvolatile memory (NVM) technologies have emerged recently. Among all the investigated new NVM candidate technologies, spin-torque-transfer memory (STT-RAM, or MRAM), phase-change random-access memory (PCRAM), and resistive random-access memory (ReRAM) are regarded as the most promising candidates. As the ultimate goal of this NVM research is to deploy them into multiple levels in the memory hierarchy, it is necessary to explore the wide NVM design space and find the proper implementation at different memory hierarchy levels from highly latency-optimized caches to highly density- optimized secondary storage. While abundant tools are available as SRAM/DRAM design assistants, similar tools for NVM designs are currently missing. Thus, in this paper, we develop NVSim, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies.

1,100 citations

Journal ArticleDOI
TL;DR: The Raw microprocessor research prototype uses a scalable instruction set architecture to attack the emerging wire-delay problem by providing a parallel, software interface to the gate, wire and pin resources of the chip.
Abstract: Wire delay is emerging as the natural limiter to microprocessor scalability. A new architectural approach could solve this problem, as well as deliver unprecedented performance, energy efficiency and cost effectiveness. The Raw microprocessor research prototype uses a scalable instruction set architecture to attack the emerging wire-delay problem by providing a parallel, software interface to the gate, wire and pin resources of the chip. An architecture that has direct, first-class analogs to all of these physical resources will ultimately let programmers achieve the maximum amount of performance and energy efficiency in the face of wire delay.

1,087 citations