Jude A. Rivers
Bio: Jude A. Rivers is an academic researcher from IBM. The author has contributed to research in topics: Cache & Cache invalidation. The author has an hindex of 25, co-authored 67 publications receiving 4427 citations.
••20 Jun 2009
TL;DR: This paper analyzes a PCM-based hybrid main memory system using an architecture level model of PCM and proposes simple organizational and management solutions of the hybrid memory that reduces the write traffic to PCM, boosting its lifetime from 3 years to 9.7 years.
Abstract: The memory subsystem accounts for a significant cost and power budget of a computer system. Current DRAM-based main memory systems are starting to hit the power and cost limit. An alternative memory technology that uses resistance contrast in phase-change materials is being actively investigated in the circuits community. Phase Change Memory (PCM) devices offer more density relative to DRAM, and can help increase main memory capacity of future systems while remaining within the cost and power constraints.In this paper, we analyze a PCM-based hybrid main memory system using an architecture level model of PCM.We explore the trade-offs for a main memory system consisting of PCMstorage coupled with a small DRAM buffer. Such an architecture has the latency benefits of DRAM and the capacity benefits of PCM. Our evaluations for a baseline system of 16-cores with 8GB DRAM show that, on average, PCM can reduce page faults by 5X and provide a speedup of 3X. As PCM is projected to have limited write endurance, we also propose simple organizational and management solutions of the hybrid memory that reduces the write traffic to PCM, boosting its lifetime from 3 years to 9.7 years.
••28 Jun 2004
TL;DR: The results imply that leveraging a single microarchitecture design for multiple remaps across a few technology generations will become increasingly difficult, and motivate a need for workload specific, microarch Architectural lifetime reliability awareness at an early design stage.
Abstract: The relentless scaling of CMOS technology has provided a steady increase in processor performance for the past three decades. However, increased power densities (hence temperatures) and other scaling effects have an adverse impact on long-term processor lifetime reliability. This paper represents a first attempt at quantifying the impact of scaling on lifetime reliability due to intrinsic hard errors, taking workload characteristics into consideration. For our quantitative evaluation, we use RAMP (Srinivasan et al., 2004), a previously proposed industrial-strength model that provides reliability estimates for a workload, but for a given technology. We extend RAMP by adding scaling specific parameters to enable workload-dependent lifetime reliability evaluation at different technologies. We show that (1) scaling has a significant impact on processor hard failure rates - on average, with SPEC benchmarks, we find the failure rate of a scaled 65nm processor to be 316% higher than a similarly pipelined 180nm processor; (2) time-dependent dielectric breakdown and electromigration have the largest increases; and (3) with scaling, the difference in reliability from running at worst-case vs. typical workload operating conditions increases significantly, as does the difference from running different workloads. Our results imply that leveraging a single microarchitecture design for multiple remaps across a few technology generations will become increasingly difficult, and motivate a need for workload specific, microarchitectural lifetime reliability awareness at an early design stage.
••02 Mar 2004
TL;DR: This paper proposes dynamic reliability management (DRM) - a technique where the processor can respond to changing application behavior to maintain its lifetime reliability target, and describes an architecture-level model and its implementation that can dynamically track lifetime reliability, responding to changes in application behavior.
Abstract: Ensuring long processor lifetimes by limiting failuresdue to wear-out related hard errors is a critical requirementfor all microprocessor manufacturers. We observethat continuous device scaling and increasing temperaturesare making lifetime reliability targets even harder to meet.However, current methodologies for qualifying lifetime reliabilityare overly conservative since they assume worst-caseoperating conditions. This paper makes the case thatthe continued use of such methodologies will significantlyand unnecessarily constrain performance. Instead, lifetimereliability awareness at the microarchitectural design stagecan mitigate this problem, by designing processors that dynamicallyadapt in response to the observed usage to meeta reliability target.We make two specific contributions. First, we describean architecture-level model and its implementation, calledRAMP, that can dynamically track lifetime reliability, respondingto changes in application behavior. RAMP isbased on state-of-the-art device models for different wear-outmechanisms. Second, we propose dynamic reliabilitymanagement (DRM) - a technique where the processorcan respond to changing application behavior to maintainits lifetime reliability target. In contrast to currentworst-case behavior based reliability qualification methodologies,DRM allows processors to be qualified for reliabilityat lower (but more likely) operating points than theworst case. Using RAMP, we show that this can save costand/or improve performance, that dynamic voltage scalingis an effective response technique for DRM, and that dynamicthermal management neither subsumes nor is sub-sumedby DRM.
••01 May 2005
TL;DR: This analysis shows that exploiting structural redundancy can provide significant reliability benefits, and it is shown that GPD is the superior technique when only limited performance or cost resources can be sacrificed for reliability.
Abstract: Increased power densities (and resultant temperatures) and other effects of device scaling are predicted to cause significant lifetime reliability problems in the near future. In this paper, we study two techniques that leverage microarchitectural structural redundancy for lifetime reliability enhancement. First, in structural duplication (SD), redundant microarchitectural structures are added to the processor and designated as spares. Spare structures can be turned on when the original structure fails, increasing the processorýs lifetime. Second, graceful performance degradation (GPD) is a technique which exploits existing microarchitectural redundancy for reliability. Redundant structures that fail are shut down while still maintaining functionality, thereby increasing the processorýs lifetime, but at a lower performance. Our analysis shows that exploiting structural redundancy can provide significant reliability benefits, and we present guidelines for efficient usage of these techniques by identifying situations where each is more beneficial. We show that GPD is the superior technique when only limited performance or cost resources can be sacrificed for reliability. Specifically, on average for our systems and applications,GPD increased processor reliability to 1.42 times the base value for less than a 5% loss in performance. On the other hand, for systems where reliability is more important than performance or cost, SD is more beneficial. SD increases reliability to 3.17 times the base value for 2.25 times the base cost, for our applications. Finally, a combination of the two techniques (SD+GPD) provides the highest reliability benefit.
TL;DR: Developing and maintaining industrywide standards for lifetime reliability is a critical task for all microprocessor manufacturers as increasingly smaller feature sizes and increasing power densities are accelerating the onset of wearout-based failures, thus shortening processor life.
Abstract: Developing and maintaining industrywide standards for lifetime reliability is a critical task for all microprocessor manufacturers. Although technology scaling continues to provide significant performance benefits, increasingly smaller feature sizes and increasing power densities are accelerating the onset of wearout-based failures, thus shortening processor life. Microarchitects have traditionally treated processor lifetime reliability as a manufacturing problem, best left to device and process engineers. In current processors, manufacturers enforce lifetime reliability, or qualify it, during device design, circuit layout, manufacture, and chip test. This reliability qualification, which is application-oblivious, is based on estimates of worst case temperature and processor utilization. However, most applications will run at lower temperature and utilization, resulting in higher reliability and longer processor lifetimes than required. As a result, current reliability qualification methodologies are overly conservative, unnecessarily increasing cost or decreasing performance. Sustaining this approach will likely be infeasible in future scaled systems.
20 Sep 2004
••11 Oct 2009
TL;DR: A file system and a hardware architecture that are designed around the properties of persistent, byteaddressable memory, which provides strong reliability guarantees and offers better performance than traditional file systems, even when both are run on top of byte-addressable, persistent memory.
Abstract: Modern computer systems have been built around the assumption that persistent storage is accessed via a slow, block-based interface. However, new byte-addressable, persistent memory technologies such as phase change memory (PCM) offer fast, fine-grained access to persistent storage.In this paper, we present a file system and a hardware architecture that are designed around the properties of persistent, byteaddressable memory. Our file system, BPFS, uses a new technique called short-circuit shadow paging to provide atomic, fine-grained updates to persistent storage. As a result, BPFS provides strong reliability guarantees and offers better performance than traditional file systems, even when both are run on top of byte-addressable, persistent memory. Our hardware architecture enforces atomicity and ordering guarantees required by BPFS while still providing the performance benefits of the L1 and L2 caches.Since these memory technologies are not yet widely available, we evaluate BPFS on DRAM against NTFS on both a RAM disk and a traditional disk. Then, we use microarchitectural simulations to estimate the performance of BPFS on PCM. Despite providing strong safety and consistency guarantees, BPFS on DRAM is typically twice as fast as NTFS on a RAM disk and 4-10 times faster than NTFS on disk. We also show that BPFS on PCM should be significantly faster than a traditional disk-based file system.
••05 Mar 2011
TL;DR: In tests emulating the performance characteristics of forthcoming SCMs, Mnemosyne can persist data as fast as 3 microseconds and can be up to 1400% faster than alternative persistence strategies, such as Berkeley DB or Boost serialization, that are designed for disks.
Abstract: New storage-class memory (SCM) technologies, such as phase-change memory, STT-RAM, and memristors, promise user-level access to non-volatile storage through regular memory instructions. These memory devices enable fast user-mode access to persistence, allowing regular in-memory data structures to survive system crashes.In this paper, we present Mnemosyne, a simple interface for programming with persistent memory. Mnemosyne addresses two challenges: how to create and manage such memory, and how to ensure consistency in the presence of failures. Without additional mechanisms, a system failure may leave data structures in SCM in an invalid state, crashing the program the next time it starts.In Mnemosyne, programmers declare global persistent data with the keyword "pstatic" or allocate it dynamically. Mnemosyne provides primitives for directly modifying persistent variables and supports consistent updates through a lightweight transaction mechanism. Compared to past work on disk-based persistent memory, Mnemosyne reduces latency to storage by writing data directly to memory at the granularity of an update rather than writing memory pages back to disk through the file system. In tests emulating the performance characteristics of forthcoming SCMs, we show that Mnemosyne can persist data as fast as 3 microseconds. Furthermore, it provides a 35 percent performance increase when applied in the OpenLDAP directory server. In microbenchmark studies we find that Mnemosyne can be up to 1400% faster than alternative persistence strategies, such as Berkeley DB or Boost serialization, that are designed for disks.
••05 Mar 2011
TL;DR: A lightweight, high-performance persistent object system called NV-heaps is implemented that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about.
Abstract: Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent data structures in non-volatile storage that is almost as fast as DRAM. Creating these data structures requires a system that is lightweight enough to expose the performance of the underlying memories but also ensures safety in the presence of application and system failures by avoiding familiar bugs such as dangling pointers, multiple free()s, and locking errors. In addition, the system must prevent new types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent.We have implemented a lightweight, high-performance persistent object system called NV-heaps that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32x and 244x, respectively, by avoiding the operating system and minimizing other software overheads. We also quantify the cost of enforcing the safety guarantees that NV-heaps provide and measure the costs of NV-heap primitive operations.
••12 Dec 2009
TL;DR: Start-Gap is proposed, a simple, novel, and effective wear-leveling technique that uses only two registers that boosts the achievable lifetime of the baseline 16 GB PCM-based system from 5% to 97% of the theoretical maximum, while incurring a total storage overhead of less than 13 bytes and obviating the latency overhead of accessing large tables.
Abstract: Phase Change Memory (PCM) is an emerging memory technology that can increase main memory capacity in a cost-effective and power-efficient manner. However, PCM cells can endure only a maximum of 107 - 108 writes, making a PCM based system have a lifetime of only a few years under ideal conditions. Furthermore, we show that non-uniformity in writes to different cells reduces the achievable lifetime of PCM system by 20x. Writes to PCM cells can be made uniform with Wear-Leveling. Unfortunately, existing wear-leveling techniques require large storage tables and indirection, resulting in significant area and latency overheads. We propose Start-Gap, a simple, novel, and effective wear-leveling technique that uses only two registers. By combining Start-Gap with simple address-space randomization techniques we show that the achievable lifetime of the baseline 16GB PCM-based system is boosted from 5% (with no wear-leveling) to 97% of the theoretical maximum, while incurring a total storage overhead of less than 13 bytes and obviating the latency overhead of accessing large tables. We also analyze the security vulnerabilities for memory systems that have limited write endurance, showing that under adversarial settings, a PCM-based system can fail in less than one minute. We provide a simple extension to Start-Gap that makes PCM-based systems robust to such malicious attacks.