scispace - formally typeset
Search or ask a question
DOI

A Case for Efficient Hardware/Software Cooperative Management of Storage and Memory

24 Jun 2013-Vol. 2013
TL;DR: The goal of this work is to explore the design of a Persistent Memory Manager that coordinates the management of memory and storage under a single hardware unit in a single address space and shows that such a system with a persistent memory can improve energy efficiency and performance.
Abstract: Most applications manipulate persistent data, yet traditional systems decouple data manipulation from persistence in a two-level storage model. Programming languages and system software manipulate data in one set of formats in volatile main memory (DRAM) using a load/store interface, while storage systems maintain persistence in another set of formats in non-volatile memories, such as Flash and hard disk drives in traditional systems, using a file system interface. Unfortunately, such an approach suffers from the system performance and energy overheads of locating data, moving data, and translating data between the different formats of these two levels of storage that are accessed via two vastly different interfaces. Yet today, new non-volatile memory (NVM) technologies show the promise of storage capacity and endurance similar to or better than Flash at latencies comparable to DRAM, making them prime candidates for providing applications a persistent single-level store with a single load/store interface to access all system data. Our key insight is that in future systems equipped with NVM, the energy consumed executing operating system and file system code to access persistent data in traditional systems becomes an increasingly large contributor to total energy. The goal of this work is to explore the design of a Persistent Memory Manager that coordinates the management of memory and storage under a single hardware unit in a single address space. Our initial simulation-based exploration shows that such a system with a persistent memory can improve energy efficiency and performance by eliminating the instructions and data movement traditionally used to perform I/O operations

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
14 Apr 2014
TL;DR: PMFS, a light-weight POSIX file system that exploits PM's byte-addressability to avoid overheads of block-oriented storage and enable direct PM access by applications (with memory-mapped I/O), is implemented.
Abstract: Emerging byte-addressable, non-volatile memory technologies offer performance within an order of magnitude of DRAM, prompting their inclusion in the processor memory subsystem. However, such load/store accessible Persistent Memory (PM) has implications on system design, both hardware and software. In this paper, we explore system software support to enable low-overhead PM access by new and legacy applications. To this end, we implement PMFS, a light-weight POSIX file system that exploits PM's byte-addressability to avoid overheads of block-oriented storage and enable direct PM access by applications (with memory-mapped I/O). PMFS exploits the processor's paging and memory ordering features for optimizations such as fine-grained logging (for consistency) and transparent large page support (for faster memory-mapped I/O). To provide strong consistency guarantees, PMFS requires only a simple hardware primitive that provides software enforceable guarantees of durability and ordering of stores to PM. Finally, PMFS uses the processor's existing features to protect PM from stray writes, thereby improving reliability.Using a hardware emulator, we evaluate PMFS's performance with several workloads over a range of PM performance characteristics. PMFS shows significant (up to an order of magnitude) gains over traditional file systems (such as ext4) on a RAMDISK-like PM block device, demonstrating the benefits of optimizing system software for PM.

622 citations


Cites background from "A Case for Efficient Hardware/Softw..."

  • ...Some researchers have proposed using a single-level store for managing PM, obviating the need to translate between memory and storage formats [32]....

    [...]

Proceedings ArticleDOI
26 May 2013
TL;DR: Three key solution directions are surveyed: enabling new DRAM architectures, functions, interfaces, and better integration of the DRAM and the rest of the system, designing a memory system that employs emerging memory technologies and takes advantage of multiple different technologies, and providing predictable performance and QoS to applications sharing the memory system.
Abstract: The memory system is a fundamental performance and energy bottleneck in almost all computing systems Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck At the same time, DRAM technology is experiencing difficult technology scaling challenges that make the maintenance and enhancement of its capacity, energy-efficiency, and reliability significantly more costly with conventional techniques In this paper, after describing the demands and challenges faced by the memory system, we examine some promising research and design directions to overcome challenges posed by memory scaling Specifically, we survey three key solution directions: 1) enabling new DRAM architectures, functions, interfaces, and better integration of the DRAM and the rest of the system, 2) designing a memory system that employs emerging memory technologies and takes advantage of multiple different technologies, 3) providing predictable performance and QoS to applications sharing the memory system We also briefly describe our ongoing related work in combating scaling challenges of NAND flash memory

270 citations


Cites background or result from "A Case for Efficient Hardware/Softw..."

  • ...Researching how to adapt applications and system software to utilize fast, byteaddressable non-volatile main memory is an important research direction to pursue [51]....

    [...]

  • ...Our preliminary evaluations show that the use of such a unit, if scalable and efficient, can eliminate the energy inefficiency and performance overheads of the two-level storage model, improving both performance and energy-efficiency of the overall system, especially for data-intensive workloads [51]....

    [...]

  • ...[51] describe the vision and research challenges of a persistent memory manager (PMM), a hardware acceleration unit that coordinates and unifies memory/storage management in a single address space that spans potentially multiple different memory technologies (DRAM, NVM, flash) via hardware/software cooperation....

    [...]

  • ...In fact, if we keep the traditional two-level memory/storage model in the presence of these fast NVM devices as part of storage, the operating system and file system code for locating, moving, and translating persistent data from the non-volatile NVM devices to volatile DRAM for manipulation purposes becomes a great bottleneck, causing most of the energy consumption and degrading performance by an order of magnitude in some data-intensive workloads, as we showed in recent work [51]....

    [...]

  • ...Unfortunately, such a decoupled memory/storage model managed via vastly different techniques (fast, hardware-accelerated memory management units on one hand, and slow operating/file system (OS/FS) software on the other) suffers from large inefficiencies in locating data, moving data, and translating data between the different formats of these two levels of storage that are accessed via two vastly different interfacesleading to potentially large amounts of wasted work and energy [51]....

    [...]

Journal ArticleDOI
TL;DR: A survey of software techniques that have been proposed to exploit the advantages and mitigate the disadvantages of NVMs when used for designing memory systems, and, in particular, secondary storage and main memory.
Abstract: Non-volatile memory (NVM) devices, such as Flash, phase change RAM, spin transfer torque RAM, and resistive RAM, offer several advantages and challenges when compared to conventional memory technologies, such as DRAM and magnetic hard disk drives (HDDs). In this paper, we present a survey of software techniques that have been proposed to exploit the advantages and mitigate the disadvantages of NVMs when used for designing memory systems, and, in particular, secondary storage (e.g., solid state drive) and main memory. We classify these software techniques along several dimensions to highlight their similarities and differences. Given that NVMs are growing in popularity, we believe that this survey will motivate further research in the field of software technology for NVMs.

244 citations


Cites background from "A Case for Efficient Hardware/Softw..."

  • ...…key research challenges such as integrating them in memory/storage hierarchy to design persistent memory systems and complement conventional memory technologies, and enhancing lifetime, reliability and performance etc. Sections 3 and 4 discuss the techniques proposed for addressing these issues....

    [...]

Journal ArticleDOI
12 Oct 2014
TL;DR: This article describes three major new research challenges and solution directions in enabling new DRAM architectures, functions, interfaces, and better integration of the DRAM and the rest of the system and designs a memory system that employs emerging non-volatile memory technologies and takes advantage of multiple different technologies.
Abstract: The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck. At the same time, DRAM technology is experiencing difficult technology scaling challenges that make the maintenance and enhancement of its capacity, energyefficiency, and reliability significantly more costly with conventional techniques.In this article, after describing the demands and challenges faced by the memory system, we examine some promising research and design directions to overcome challenges posed by memory scaling. Specifically, we describe three major new research challenges and solution directions: 1 enabling new DRAM architectures, functions, interfaces, and better integration of the DRAM and the rest of the system an approach we call system-DRAM co-design, 2 designing a memory system that employs emerging non-volatile memory technologies and takes advantage of multiple different technologies i.e., hybrid memory systems, 3 providing predictable performance and QoS to applications sharing the memory system i.e., QoS-aware memory systems. We also briefly describe our ongoing related work in combating scaling challenges of NAND flash memory.

188 citations


Cites background or result from "A Case for Efficient Hardware/Softw..."

  • ...[127] describe the vision and research challenges of a persistent memory manager (PMM), a hardware acceleration unit that coordinates and unifies memory/storage Research Problems and Opportunities in Memory Systems...

    [...]

  • ...Such emerging technologies can enable new opportunities in system design, including, for example, the unification of memory and storage subsystems [127]....

    [...]

  • ...Our preliminary evaluations show that the use of such a unit, if scalable and efficient, can greatly reduce the energy inefficiency and performance overheads of the two-level storage model, improving both performance and energy-efficiency of the overall system, especially for data-intensive workloads [127]....

    [...]

  • ...In fact, if we keep the traditional two-level memory/storage model in the presence of these fast NVM devices as part of storage, the operating system and file system code for locating, moving, and translating persistent data from the non-volatile NVM devices to volatile DRAM for manipulation purposes becomes a great bottleneck, causing most of the memory energy consumption and degrading performance by an order of magnitude in some data-intensive workloads, as we showed in recent work [127]....

    [...]

  • ..., as in the case of heterogeneous DRAM refresh rates [114], tiered-latency DRAM [109], heterogeneous-reliability memory [122], locality-aware management of hybrid memory systems [201] and the persistent memory manager for a heterogeneous array of memory/storage devices [127], five of the many ideas we have discussed in this paper....

    [...]

Proceedings ArticleDOI
05 Dec 2015
TL;DR: A hardware-assisted DRAM+NVM hybrid persistent memory design, Transparent Hybrid NVM (ThyNVM), which supports software-transparent crash consistency of memory data in a hybrid memory system and efficiently enforce crash consistency through a new dual-scheme checkpointing mechanism.
Abstract: Emerging byte-addressable nonvolatile memories (NVMs) promise persistent memory, which allows processors to directly access persistent data in main memory. Yet, persistent memory systems need to guarantee a consistent memory state in the event of power loss or a system crash (i.e., crash consistency). To guarantee crash consistency, most prior works rely on programmers to (1) partition persistent and transient memory data and (2) use specialized software interfaces when updating persistent memory data. As a result, taking advantage of persistent memory requires significant programmer effort, e.g., to implement new programs as well as modify legacy programs. Use cases and adoption of persistent memory can therefore be largely limited. In this paper, we propose a hardware-assisted DRAM+NVM hybrid persistent memory design, Transparent Hybrid NVM (ThyNVM), which supports software-transparent crash consistency of memory data in a hybrid memory system. To efficiently enforce crash consistency, we design a new dual-scheme checkpointing mechanism, which efficiently overlaps checkpointing time with application execution time. The key novelty is to enable checkpointing of data at multiple granularities, cache block or page granularity, in a coordinated manner. This design is based on our insight that there is a tradeoff between the application stall time due to checkpointing and the hardware storage overhead of the metadata for checkpointing, both of which are dictated by the granularity of checkpointed data. To get the best of the tradeoff, our technique adapts the checkpointing granularity to the write locality characteristics of the data and coordinates the management of multiple-granularity updates. Our evaluation across a variety of applications shows that ThyNVM performs within 4.9% of an idealized DRAM-only system that can provide crash consistency at no cost.

172 citations


Cites background from "A Case for Efficient Hardware/Softw..."

  • ...Hardware support for persistent memory is receiving increasing attention [45, 14, 83, 49, 58, 32, 42, 57, 43, 84]....

    [...]

  • ...It offers an essential benefit to applications: applications can directly access persistent data in main memory through a fast, load/store interface, without paging them in/out of storage devices, changing data formats in (de)serialization, or executing costly system calls [45]....

    [...]

References
More filters
Proceedings ArticleDOI
12 Dec 2009
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

2,487 citations


"A Case for Efficient Hardware/Softw..." refers background or methods in this paper

  • ...We modeled energy using the McPAT [27] framework and the values assumed for device power are shown in Table 1....

    [...]

  • ...[27] S. Li et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures....

    [...]

  • ...41W average dynamic power per core / 149W peak power [27]....

    [...]

  • ...Prior works tried to make file lookup and update efficient in software [27, 28] in the presence of persistent memory, and other works proposed using complex and potentially inefficient hardware directory techniques (e....

    [...]

Proceedings ArticleDOI
20 Jun 2009
TL;DR: This work proposes, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM.
Abstract: Memory scaling is in jeopardy as charge storage and sensing mechanisms become less reliable for prevalent memory technologies, such as DRAM. In contrast, phase change memory (PCM) storage relies on scalable current and thermal mechanisms. To exploit PCM's scalability as a DRAM alternative, PCM must be architected to address relatively long latencies, high energy writes, and finite endurance.We propose, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM. A baseline PCM system is 1.6x slower and requires 2.2x more energy than a DRAM system. Buffer reorganizations reduce this delay and energy gap to 1.2x and 1.0x, using narrow rows to mitigate write energy and multiple rows to improve locality and write coalescing. Partial writes enhance memory endurance, providing 5.6 years of lifetime. Process scaling will further reduce PCM energy costs and improve endurance.

1,568 citations


"A Case for Efficient Hardware/Softw..." refers background in this paper

  • ...[26] NVM: 45mW/chip static power [18], row buffer read (write): 0....

    [...]

Journal ArticleDOI
20 Apr 2010
TL;DR: The physics behind this large resistivity contrast between the amorphous and crystalline states in phase change materials is presented and how it is being exploited to create high density PCM is described.
Abstract: In this paper, recent progress of phase change memory (PCM) is reviewed. The electrical and thermal properties of phase change materials are surveyed with a focus on the scalability of the materials and their impact on device design. Innovations in the device structure, memory cell selector, and strategies for achieving multibit operation and 3-D, multilayer high-density memory arrays are described. The scaling properties of PCM are illustrated with recent experimental results using special device test structures and novel material synthesis. Factors affecting the reliability of PCM are discussed.

1,488 citations

Proceedings ArticleDOI
20 Jun 2009
TL;DR: This paper analyzes a PCM-based hybrid main memory system using an architecture level model of PCM and proposes simple organizational and management solutions of the hybrid memory that reduces the write traffic to PCM, boosting its lifetime from 3 years to 9.7 years.
Abstract: The memory subsystem accounts for a significant cost and power budget of a computer system. Current DRAM-based main memory systems are starting to hit the power and cost limit. An alternative memory technology that uses resistance contrast in phase-change materials is being actively investigated in the circuits community. Phase Change Memory (PCM) devices offer more density relative to DRAM, and can help increase main memory capacity of future systems while remaining within the cost and power constraints.In this paper, we analyze a PCM-based hybrid main memory system using an architecture level model of PCM.We explore the trade-offs for a main memory system consisting of PCMstorage coupled with a small DRAM buffer. Such an architecture has the latency benefits of DRAM and the capacity benefits of PCM. Our evaluations for a baseline system of 16-cores with 8GB DRAM show that, on average, PCM can reduce page faults by 5X and provide a speedup of 3X. As PCM is projected to have limited write endurance, we also propose simple organizational and management solutions of the hybrid memory that reduces the write traffic to PCM, boosting its lifetime from 3 years to 9.7 years.

1,451 citations


"A Case for Efficient Hardware/Softw..." refers background in this paper

  • ...a DRAM cache, as others have shown in the context of heterogeneous main memories [21, 30, 31, 37, 38, 49]....

    [...]

Patent
27 Nov 2008
TL;DR: In this paper, recent progress of phase change memory (PCM) is reviewed and innovations in the device structure, memory cell selector, and strategies for achieving multibit operation and 3D, multilayer high-density memory arrays are described.
Abstract: A phase-change memory element with side-wall contacts is disclosed, which has a bottom electrode. A non-metallic layer is formed on the electrode, exposing the periphery of the top surface of the electrode. A first electrical contact is on the non-metallic layer to connect the electrode. A dielectric layer is on and covering the first electrical contact. A second electrical contact is on the dielectric layer. An opening is to pass through the second electrical contact, the dielectric layer, and the first electrical contact and preferably separated from the electrode by the non-metallic layer. A phase-change material is to occupy one portion of the opening, wherein the first and second electrical contacts interface the phase-change material at the side-walls of the phase-change material. A second non-metallic layer may be formed on the second electrical contact. A top electrode contacts the top surface of the outstanding terminal of the second electrical contact.

936 citations


"A Case for Efficient Hardware/Softw..." refers background in this paper

  • ...For I/O-bound benchmarks, using NVM instead of disk (NB) helps performance and energy by using a much faster storage device (in this case, PCM)....

    [...]

  • ...New byte-addressable NVM technologies such as phase-change memory (PCM) [46], spin-transfer torque RAM (STT-RAM) [12, 25], and resistive RAM (RRAM), are expected to have storage capacity and endurance similar to Flash—at latencies comparable to DRAM....

    [...]

  • ...New byte-addressable NVM technologies such as phase-change memory (PCM) [46], spin-transfer torque RAM (STT-RAM) [12, 25], and resistive RAM (RRAM), are expected to have storage capacity and endurance similar to Flash—at latencies...

    [...]