scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Write Variation Aware Buffer Assignment for Improved Lifetime of Non-Volatile Buffers in On-Chip Interconnects

TL;DR: This paper attempts to reduce static power consumption by using non-volatile memory technology-based spin-transfer torque random access memory (STT-RAM) buffers to reduce write variation to almost 0% and improve lifetime by 3.3 and 19.9 times for intra-VNet and inter-V net, respectively.
Abstract: With multiple cores integrated on the same die, communication across cores is managed by on-chip interconnect called network-on-chip (NoC). Power and performance of these interconnect is a significant factor as the communication network consumes a considerable share of the power budget. In particular, the buffers used at every port of the NoC router consume considerable dynamic as well as static power. This paper attempts to reduce static power consumption by using non-volatile memory technology-based spin-transfer torque random access memory (STT-RAM) buffers. STT-RAM technology has the advantage of high density and low leakage but suffers from weaker write endurance. This impacts the lifetime of the router as a whole. The buffers in a router are allocated to virtual networks (VNets) and in-turn to virtual channels (VCs) within each VNet. To reduce uneven writes across the buffers, we propose policies to reduce intra-VNet write variation and inter-VNet write variation. The former performs write variation aware VC allocation in each VNet, and the latter does write variation aware buffer assignments to each VNet. Experimental evaluation on full system simulator shows that proposed policies reduce write variation to almost 0% and improve lifetime by 3.3 and 19.9 times for intra-VNet and inter-VNet, respectively. We also get significant gains in the energy delay product.
Citations
More filters
Journal ArticleDOI
TL;DR: Kargar and Nawab as mentioned in this paper explored state-of-the-art work on deploying NVMs in database and storage systems communities and the ways their limitations are being handled within these communities.
Abstract: Abstract Recently, non-volatile memory (NVM) technology has revolutionized the landscape of memory systems. With many advantages, such as non volatility and near zero standby power consumption, these byte-addressable memory technologies are taking the place of DRAMs. Nonetheless, they also present some limitations, such as limited write endurance, which hinders their widespread use in today’s systems. Furthermore, adjusting current data management systems to embrace these new memory technologies and all their potential is proving to be a nontrivial task. Because of this, a substantial amount of research has been done, from both the database community and the storage systems community, that tries to improve various aspects of NVMs to integrate these technologies into the memory hierarchy. In this work, which is the extended version of Kargar and Nawab (Proc. VLDB Endowment 14(12):3194–3197, 2021), we explore state-of-the-art work on deploying NVMs in database and storage systems communities and the ways their limitations are being handled within these communities. In particular, we focus on (1) the challenges that are related to high energy consumption, low write endurance and asymmetric read/write costs and (2) how these challenges can be solved using hardware and software solutions, especially by reducing the number of bit flips in write operations. We believe that this area has not gained enough attention in the data management community and this tutorial will provide information on how to integrate recent advances from the NVM storage community into existing and future data management systems.

4 citations

Proceedings ArticleDOI
10 Aug 2020
TL;DR: A write reduction technique, which is based on dirty flits present in write-back data packets, which results in a significant decrease in total and dynamic network power consumption and shows remarkable improvement in the lifetime.
Abstract: In a multi-core system, communication across cores is managed by an on-chip interconnect called Network-on-Chip (NoC). The utilization of NoC results in limitations such as high communication delay and high network power consumption. The buffers of the NoC router consume a considerable amount of leakage power. This paper attempts to reduce leakage power consumption by using Non-Volatile Memory technology-based buffers. NVM technology has the advantage of higher density and low leakage but suffers from costly write operation, and weaker write endurance. These characteristics impact on the total network power consumption, network latency, and lifetime of the router as a whole.In this paper, we propose a write reduction technique, which is based on dirty flits present in write-back data packets. The method also suggests a dirty flit based Virtual Channel (VC) allocation technique that distributes writes in NVM technology-based VCs to improve the lifetime of NVM buffers.The experimental evaluation on the full system simulator shows that the proposed policy obtains a 53% reduction in write-back flits, which results in 27% lesser total network flit on average. All these results in a significant decrease in total and dynamic network power consumption. The policy also shows remarkable improvement in the lifetime.

2 citations


Cites background from "Write Variation Aware Buffer Assign..."

  • ...[16, 17] presented wear-leveling techniques to remove unwanted write variation in NVM based buffers to improve the lifetime of NoC routers....

    [...]

Journal ArticleDOI
TL;DR: Keep the routers always powered ON to maintain constant connectivity and investigate various approaches to use a combination of SRAM and nonvolatile spin-transfer torque random access memory-based VCs in the routers, which yield significant energy savings while maintaining connectivity.
Abstract: In the era of dark silicon, several components on the chip [i.e., cores, memory, and network on chip (NoC)] need to be powered-off or run in low-power mode. This is mainly due to the increased leakage power consumption at smaller technology nodes. Other than the power consumed by cores and caches, power and performance of the interconnects is a significant factor as the communication network consumes a considerable share of the power budget. In particular, the buffers used at every port of the NoC router consume considerable dynamic as well as static power. To support dark silicon and save energy, a popular approach is to power off the routers and wake them up when needed. However, this affects the packet latency, and we need to observe the traffic through the nodes to decide turning the routers ON–OFF. In this article, we propose to keep the routers always powered ON to maintain constant connectivity and investigate various approaches. One proposal is to frequency scale the routers connected to powered OFF nodes, and the other proposals are to use a combination of SRAM and nonvolatile spin-transfer torque random access memory-based VCs in the routers. By managing which VCs to be active at a given time, we achieve energy savings. The proposals are evaluated by varying the percentage of dark nodes on the chip. The experimental results show that all proposals yield significant energy savings while maintaining connectivity.

1 citations


Cites background from "Write Variation Aware Buffer Assign..."

  • ...Rani and Kapoor [44], [45] presented wear-leveling techniques to remove unwanted write variation in NVMbased buffers to improve the lifetime of NoC routers....

    [...]

Proceedings ArticleDOI
01 Aug 2020
TL;DR: The experimental results show that the proposed design can effectively avoid the uneven bit-level wearing, when compared with page-based FTL on NAND-SPIN.
Abstract: Non-Volatile random access memory (NVRAM) has been regarded as a promising DRAM alternative with its nonvolatility, near-zero idle power consumption, and byte addressability. In particular, some NVRAM devices, such as Spin Torque Transfer (STT) RAM, can provide the same or better access performance and lower power consumption when compared with dynamic random access memory (DRAM). These nice features make NVRAM become an attractive DRAM replacement on NAND flash storage for resolving the management overhead of the flash translation layer (FTL). For instance, when adopting NVRAM for storing the mapping entries of FTL, the overheads of loading and storing the mapping entries between the non-volatile NAND flash and the volatile DRAM can be eliminated. Nevertheless, due to the limited lifetime constraint of NVRAM, the bit-level update behavior of FTL may lead to the issue of uneven bit-level wearing and the lifetime capacity of those less-worn NVRAM cells could be underutilized. Such an observation motivates this study to utilize the emerging NAND-like Spin Torque Transfer memory (NAND-SPIN) for alleviating the uneven bit-level wearing of NVRAM-based FTL and making the best of the lifetime capacity of each NAND-SPIN cell. The experimental results show that the proposed design can effectively avoid the uneven bit-level wearing, when compared with page-based FTL on NAND-SPIN.

Cites methods from "Write Variation Aware Buffer Assign..."

  • ...Based on previous study [7] and assuming NAND-SPIN has the same MTJ size as STT-RAM, the lifetime is predicted to be 4× 10(12)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
Abstract: The gem5 simulation infrastructure is the merger of the best aspects of the M5 [4] and GEMS [9] simulators. M5 provides a highly configurable simulation framework, multiple ISAs, and diverse CPU models. GEMS complements these features with a detailed and exible memory system, including support for multiple cache coherence protocols and interconnect models. Currently, gem5 supports most commercial ISAs (ARM, ALPHA, MIPS, Power, SPARC, and x86), including booting Linux on three of them (ARM, ALPHA, and x86).The project is the result of the combined efforts of many academic and industrial institutions, including AMD, ARM, HP, MIPS, Princeton, MIT, and the Universities of Michigan, Texas, and Wisconsin. Over the past ten years, M5 and GEMS have been used in hundreds of publications and have been downloaded tens of thousands of times. The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.

4,039 citations


"Write Variation Aware Buffer Assign..." refers methods in this paper

  • ...We evaluate our proposed approaches on a full system Gem5 [32], a multi-core simulator, with Garnet2....

    [...]

Proceedings ArticleDOI
25 Oct 2008
TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.
Abstract: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and synthesis (RMS) as well as systems applications which mimic large-scale multithreaded commercial programs. Our characterization shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic. The benchmark suite has been made available to the public.

3,514 citations


"Write Variation Aware Buffer Assign..." refers methods in this paper

  • ...We evaluate our work with PARSEC [37] and SPEC CPU2006 [38] benchmark suites....

    [...]

Book
01 Jan 2004
TL;DR: This book offers a detailed and comprehensive presentation of the basic principles of interconnection network design, clearly illustrating them with numerous examples, chapter exercises, and case studies, allowing a designer to see all the steps of the process from abstract design to concrete implementation.
Abstract: One of the greatest challenges faced by designers of digital systems is optimizing the communication and interconnection between system components. Interconnection networks offer an attractive and economical solution to this communication crisis and are fast becoming pervasive in digital systems. Current trends suggest that this communication bottleneck will be even more problematic when designing future generations of machines. Consequently, the anatomy of an interconnection network router and science of interconnection network design will only grow in importance in the coming years. This book offers a detailed and comprehensive presentation of the basic principles of interconnection network design, clearly illustrating them with numerous examples, chapter exercises, and case studies. It incorporates hardware-level descriptions of concepts, allowing a designer to see all the steps of the process from abstract design to concrete implementation. ·Case studies throughout the book draw on extensive author experience in designing interconnection networks over a period of more than twenty years, providing real world examples of what works, and what doesn't. ·Tightly couples concepts with implementation costs to facilitate a deeper understanding of the tradeoffs in the design of a practical network. ·A set of examples and exercises in every chapter help the reader to fully understand all the implications of every design decision. Table of Contents Chapter 1 Introduction to Interconnection Networks 1.1 Three Questions About Interconnection Networks 1.2 Uses of Interconnection Networks 1.3 Network Basics 1.4 History 1.5 Organization of this Book Chapter 2 A Simple Interconnection Network 2.1 Network Specifications and Constraints 2.2 Topology 2.3 Routing 2.4 Flow Control 2.5 Router Design 2.6 Performance Analysis 2.7 Exercises Chapter 3 Topology Basics 3.1 Nomenclature 3.2 Traffic Patterns 3.3 Performance 3.4 Packaging Cost 3.5 Case Study: The SGI Origin 2000 3.6 Bibliographic Notes 3.7 Exercises Chapter 4 Butterfly Networks 4.1 The Structure of Butterfly Networks 4.2 Isomorphic Butterflies 4.3 Performance and Packaging Cost 4.4 Path Diversity and Extra Stages 4.5 Case Study: The BBN Butterfly 4.6 Bibliographic Notes 4.7 Exercises Chapter 5 Torus Networks 5.1 The Structure of Torus Networks 5.2 Performance 5.3 Building Mesh and Torus Networks 5.4 Express Cubes 5.5 Case Study: The MIT J-Machine 5.6 Bibliographic Notes 5.7 Exercises Chapter 6 Non-Blocking Networks 6.1 Non-Blocking vs. Non-Interfering Networks 6.2 Crossbar Networks 6.3 Clos Networks 6.4 Benes Networks 6.5 Sorting Networks 6.6 Case Study: The Velio VC2002 (Zeus) Grooming Switch 6.7 Bibliographic Notes 6.8 Exercises Chapter 7 Slicing and Dicing 7.1 Concentrators and Distributors 7.2 Slicing and Dicing 7.3 Slicing Multistage Networks 7.4 Case Study: Bit Slicing in the Tiny Tera 7.5 Bibliographic Notes 7.6 Exercises Chapter 8 Routing Basics 8.1 A Routing Example 8.2 Taxonomy of Routing Algorithms 8.3 The Routing Relation 8.4 Deterministic Routing 8.5 Case Study: Dimension-Order Routing in the Cray T3D 8.6 Bibliographic Notes 8.7 Exercises Chapter 9 Oblivious Routing 9.1 Valiant's Randomized Routing Algorithm 9.2 Minimal Oblivious Routing 9.3 Load-Balanced Oblivious Routing 9.4 Analysis of Oblivious Routing 9.5 Case Study: Oblivious Routing in the Avici Terabit Switch Router(TSR) 9.6 Bibliographic Notes 9.7 Exercises Chapter 10 Adaptive Routing 10.1 Adaptive Routing Basics 10.2 Minimal Adaptive Routing 10.3 Fully Adaptive Routing 10.4 Load-Balanced Adaptive Routing 10.5 Search-Based Routing 10.6 Case Study: Adaptive Routing in the Thinking Machines CM-5 10.7 Bibliographic Notes 10.8 Exercises Chapter 11 Routing Mechanics 11.1 Table-Based Routing 11.2 Algorithmic Routing 11.3 Case Study: Oblivious Source Routing in the IBM Vulcan Network 11.4 Bibliographic Notes 11.5 Exercises Chapter 12 Flow Control Basics 12.1 Resources and Allocation Units 12.2 Bufferless Flow Control 12.3 Circuit Switching 12.4 Bibliographic Notes 12.5 Exercises Chapter 13 Buffered Flow Control 13.1 Packet-Buffer Flow Control 13.2 Flit-Buffer Flow Control 13.3 Buffer Management and Backpressure 13.4 Flit-Reservation Flow Control 13.5 Bibliographic Notes 13.6 Exercises Chapter 14 Deadlock and Livelock 14.1 Deadlock 14.2 Deadlock Avoidance 14.3 Adaptive Routing 14.4 Deadlock Recovery 14.5 Livelock 14.6 Case Study: Deadlock Avoidance in the Cray T3E 14.7 Bibliographic Notes 14.8 Exercises Chapter 15 Quality of Service 15.1 Service Classes and Service Contracts 15.2 Burstiness and Network Delays 15.3 Implementation of Guaranteed Services 15.4 Implementation of Best-Effort Services 15.5 Separation of Resources 15.6 Case Study: ATM Service Classes 15.7 Case Study: Virtual Networks in the Avici TSR 15.8 Bibliographic Notes 15.9 Exercises Chapter 16 Router Architecture 16.1 Basic Router Architecture 16.2 Stalls 16.3 Closing the Loop with Credits 16.4 Reallocating a Channel 16.5 Speculation and Lookahead 16.6 Flit and Credit Encoding 16.7 Case Study: The Alpha 21364 Router 16.8 Bibliographic Notes 16.9 Exercises Chapter 17 Router Datapath Components 17.1 Input Buffer Organization 17.2 Switches 17.3 Output Organization 17.4 Case Study: The Datapath of the IBM Colony Router 17.5 Bibliographic Notes 17.6 Exercises Chapter 18 Arbitration 18.1 Arbitration Timing 18.2 Fairness 18.3 Fixed Priority Arbiter 18.4 Variable Priority Iterative Arbiters 18.5 Matrix Arbiter 18.6 Queuing Arbiter 18.7 Exercises Chapter 19 Allocation 19.1 Representations 19.2 Exact Algorithms 19.3 Separable Allocators 19.4 Wavefront Allocator 19.5 Incremental vs. Batch Allocation 19.6 Multistage Allocation 19.7 Performance of Allocators 19.8 Case Study: The Tiny Tera Allocator 19.9 Bibliographic Notes 19.10 Exercises Chapter 20 Network Interfaces 20.1 Processor-Network Interface 20.2 Shared-Memory Interface 20.3 Line-Fabric Interface 20.4 Case Study: The MIT M-Machine Network Interface 20.5 Bibliographic Notes 20.6 Exercises Chapter 21 Error Control 411 21.1 Know Thy Enemy: Failure Modes and Fault Models 21.2 The Error Control Process: Detection, Containment, and Recovery 21.3 Link Level Error Control 21.4 Router Error Control 21.5 Network-Level Error Control 21.6 End-to-end Error Control 21.7 Bibliographic Notes 21.8 Exercises Chapter 22 Buses 22.1 Bus Basics 22.2 Bus Arbitration 22.3 High Performance Bus Protocol 22.4 From Buses to Networks 22.5 Case Study: The PCI Bus 22.6 Bibliographic Notes 22.7 Exercises Chapter 23 Performance Analysis 23.1 Measures of Interconnection Network Performance 23.2 Analysis 23.3 Validation 23.4 Case Study: Efficiency and Loss in the BBN Monarch Network 23.5 Bibliographic Notes 23.6 Exercises Chapter 24 Simulation 24.1 Levels of Detail 24.2 Network Workloads 24.3 Simulation Measurements 24.4 Simulator Design 24.5 Bibliographic Notes 24.6 Exercises Chapter 25 Simulation Examples 495 25.1 Routing 25.2 Flow Control Performance 25.3 Fault Tolerance Appendix A Nomenclature Appendix B Glossary Appendix C Network Simulator

3,233 citations


Additional excerpts

  • ..., 16×16 mesh network using synthetic traffic patterns [39]....

    [...]

Journal ArticleDOI
John L. Henning1
TL;DR: On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006, which replaces CPU2000, and the SPEC CPU benchmarks are widely used in both industry and academia.
Abstract: On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006 [2], which replaces CPU2000. The SPEC CPU benchmarks are widely used in both industry and academia [3].

1,864 citations


"Write Variation Aware Buffer Assign..." refers methods in this paper

  • ...We evaluate our work with PARSEC [37] and SPEC CPU2006 [38] benchmark suites....

    [...]

  • ...From the list of SPEC CPU2006 benchmarks, we made 12 multi-programed workloads for 16 cores and 6 for 64 cores....

    [...]

Journal ArticleDOI
TL;DR: NVSim is developed, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash and is expected to help boost architecture-level NVM-related studies.
Abstract: Various new nonvolatile memory (NVM) technologies have emerged recently. Among all the investigated new NVM candidate technologies, spin-torque-transfer memory (STT-RAM, or MRAM), phase-change random-access memory (PCRAM), and resistive random-access memory (ReRAM) are regarded as the most promising candidates. As the ultimate goal of this NVM research is to deploy them into multiple levels in the memory hierarchy, it is necessary to explore the wide NVM design space and find the proper implementation at different memory hierarchy levels from highly latency-optimized caches to highly density- optimized secondary storage. While abundant tools are available as SRAM/DRAM design assistants, similar tools for NVM designs are currently missing. Thus, in this paper, we develop NVSim, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies.

1,100 citations


"Write Variation Aware Buffer Assign..." refers methods in this paper

  • ...We use Cacti-STT [35] and NVSim [36] to get SRAM and STTRAM latency, read-write energy, and leakage power....

    [...]