scispace - formally typeset
Search or ask a question

Showing papers on "Memory refresh published in 2017"


Proceedings ArticleDOI
01 Sep 2017
TL;DR: UH-MEM as discussed by the authors is a page management mechanism for various hybrid memories that systematically estimates the utility of migrating a page between different memory types, and uses this information to guide data placement.
Abstract: While the memory footprints of cloud and HPC applications continue to increase, fundamental issues with DRAM scaling are likely to prevent traditional main memory systems, composed of monolithic DRAM, from greatly growing in capacity Hybrid memory systems can mitigate the scaling limitations of monolithic DRAM by pairing together multiple memory technologies (eg, different types of DRAM, or DRAM and non-volatile memory) at the same level of the memory hierarchy The goal of a hybrid main memory is to combine the different advantages of the multiple memory types in a cost-effective manner while avoiding the disadvantages of each technology Memory pages are placed in and migrated between the different memories within a hybrid memory system, based on the properties of each page It is important to make intelligent page management (ie, placement and migration) decisions, as they can significantly affect system performanceIn this paper, we propose utility-based hybrid memory management (UH-MEM), a new page management mechanism for various hybrid memories, that systematically estimates the utility (ie, the system performance benefit) of migrating a page between different memory types, and uses this information to guide data placement UH-MEM operates in two steps First, it estimates how much a single application would benefit from migrating one of its pages to a different type of memory, by comprehensively considering access frequency, row buffer locality, and memory-level parallelism Second, it translates the estimated benefit of a single application to an estimate of the overall system performance benefit from such a migrationWe evaluate the effectiveness of UH-MEM with various types of hybrid memories, and show that it significantly improves system performance on each of these hybrid memories For a memory system with DRAM and non-volatile memory, UH-MEM improves performance by 14% on average (and up to 26%) compared to the best of three evaluated state-of-the-art mechanisms across a large number of data-intensive workloads

78 citations


Journal ArticleDOI
02 May 2017
TL;DR: This review updates the learning on the fundamental materials and process integration needed for high-volume manufacturing and summarizes very recent progress on array level performance improvement methodology using novel techniques, and circuit level contributions for different applications.
Abstract: Resistive random access memory (RRAM) is regarded as one of the most promising emerging memory technologies for next-generation embedded, standalone nonvolatile memory (NVM), and storage class memory (SCM) due to its speed, density, cost, and scalability. Considerable progress has been made in recent years on the manufacturability of RRAM, with low-density RRAM products now in production and the path to higher density parts becoming clearer. This review updates the learning on the fundamental materials and process integration needed for high-volume manufacturing and summarizes very recent progress on array level performance improvement methodology using novel techniques, and circuit level contributions for different applications. The device performance, array integration, and device/circuit codesign for memory systems are discussed. Novel applications besides embedded memory and standalone memory are addressed, including hardware security, neuromorphic computing, and nonvolatile logic systems.

78 citations


Journal ArticleDOI
Wang Kang1, Haotian Wang1, Zhaohao Wang1, Youguang Zhang1, Weisheng Zhao1 
12 May 2017
TL;DR: A cost-efficient IMP/NMP solution in spin-transfer torque magnetic random access memory (STT–MRAM) without adding any processing units on the memory chip is proposed.
Abstract: In the current big data era, the memory wall issue between the processor and the memory becomes one of the most critical bottlenecks for conventional Von-Newman computer architecture In-memory processing (IMP) or near-memory processing (NMP) paradigms have been proposed to address this problem by adding a small amount of processing units inside/near the memory Unfortunately, although intensively studied, prior IMP/NMP platforms are practically unsuccessful because of the fabrication complexity and cost efficiency by integrating the processing units and memory on the same chip Recently, emerging nonvolatile memories provide new possibility for efficiently implementing the IMP/NMP paradigm In this paper, we propose a cost-efficient IMP/NMP solution in spin-transfer torque magnetic random access memory (STT–MRAM) without adding any processing units on the memory chip The key idea behind the proposed IMP/NMP solution is to exploit the peripheral circuitry already existing inside memory (or with minimal changes) to perform bitwise logic operations Such an IMP/NMP platform enables rather fast logic operations as the logic results can be obtained immediately through just a memory-like readout operation Memory read and logics not, and/nand, and or/nor operations can be achieved and dynamically configured within the same STT–MRAM chip Functionality and performance are evaluated with hybrid simulations under the 40 nm technology node

67 citations


Journal ArticleDOI
TL;DR: DESTINY is a tool for modeling 2D/3D memories designed using SRAM, resistive RAM (ReRAM), spin transfer torque RAM (STT-RAM), phase change RAM (PCM) and embedded DRAM (eDRAM), domain wall memory (DWM) and Flash memory and also supports modeling MLC designs for NVMs.
Abstract: To enable the design of large capacity memory structures, novel memory technologies such as non-volatile memory (NVM) and novel fabrication approaches, e.g., 3D stacking and multi-level cell (MLC) design have been explored. The existing modeling tools, however, cover only few memory technologies, technology nodes and fabrication approaches. We present DESTINY, a tool for modeling 2D/3D memories designed using SRAM, resistive RAM (ReRAM), spin transfer torque RAM (STT-RAM), phase change RAM (PCM) and embedded DRAM (eDRAM) and 2D memories designed using spin orbit torque RAM (SOT-RAM), domain wall memory (DWM) and Flash memory. In addition to single-level cell (SLC) designs for all these memories, DESTINY also supports modeling MLC designs for NVMs. We have extensively validated DESTINY against commercial and research prototypes of these memories. DESTINY is very useful for performing design-space exploration across several dimensions, such as optimizing for a target (e.g. latency, area or energy-delay product) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e. 2D v/s 3D) for a given optimization target, etc. We believe that DESTINY will boost studies of next-generation memory architectures used in systems ranging from mobile devices to extreme-scale supercomputers.

59 citations


Proceedings ArticleDOI
24 Jun 2017
TL;DR: It is demonstrated that smart memory, memory with compute capability and a packetized interface, can dramatically simplify this problem and have one to two orders of magnitude of lower overheads for performance, space, energy, and memory bandwidth, compared to prior solutions.
Abstract: A practically feasible low-overhead hardware design that provides strong defenses against memory bus side channel remains elusive. This paper observes that smart memory, memory with compute capability and a packetized interface, can dramatically simplify this problem. InvisiMem expands the trust base to include the logic layer in the smart memory to implement cryptographic primitives, which aid in addressing several memory bus side channel vulnerabilities efficiently. This allows the secure host processor to send encrypted addresses over the untrusted memory bus, and thereby eliminates the need for expensive address obfuscation techniques based on Oblivious RAM (ORAM). In addition, smart memory enables efficient solutions for ensuring freshness without using expensive Merkle trees, and mitigates memory bus timing channel using constant heart-beat packets. We demonstrate that InvisiMem designs have one to two orders of magnitude of lower overheads for performance, space, energy, and memory bandwidth, compared to prior solutions.

56 citations


Journal ArticleDOI
TL;DR: A dynamic adaptive replacement policy (DARP) in the shared last-level cache for the DRAM/PCM hybrid main memory is proposed and results have shown that the DARP improved the memory access efficiency by 25.4%.
Abstract: The increasing demand on the main memory capacity is one of the main big data challenges. Dynamic random access memory (DRAM) does not represent the best choice for a main memory, due to high power consumption and low density. However, the nonvolatile memory, such as the phase-change memory (PCM), represents an additional choice because of the low power consumption and high-density characteristic. Nevertheless, the high access latency and limited write endurance have disabled the PCM to replace the DRAM currently. Therefore, a hybrid memory, which combines both the DRAM and the PCM, has become a good alternative to the traditional DRAM memory. Both DRAM and PCM disadvantages are challenges for the hybrid memory. In this paper, a dynamic adaptive replacement policy (DARP) in the shared last-level cache for the DRAM/PCM hybrid main memory is proposed. The DARP distinguishes the cache data into the PCM data and the DRAM data, then, the algorithm adopts different replacement policies for each data type. Specifically, for the PCM data, the least recently used (LRU) replacement policy is adopted, and for the DRAM data, the DARP is employed according to the process behavior. Experimental results have shown that the DARP improved the memory access efficiency by 25.4%.

55 citations


Journal ArticleDOI
TL;DR: A novel memory architecture called a resource-efficient SRAM-based TCAM (REST), which emulates TCAM functionality using optimal resources and increases the overall emulated TCAM bits/SRAM at the cost of reduced throughput.
Abstract: Static random access memory (SRAM)-based ternary content addressable memory (TCAM) offers TCAM functionality by emulating it with SRAM. However, this emulation suffers from reduced memory efficiency while mapping the TCAM table on SRAM units. This is due to the limited capacity of the physical addresses in the SRAM unit. This brief offers a novel memory architecture called a resource-efficient SRAM-based TCAM (REST), which emulates TCAM functionality using optimal resources. The SRAM unit is divided into multiple virtual blocks to store the address information presented in the TCAM table. This approach virtually increases the overall address space of the SRAM unit, mapping a greater portion of the TCAM table in SRAM and increasing the overall emulated TCAM bits/SRAM at the cost of reduced throughput. A $72 \times 28$ -bit REST consumes only one 36-kbit SRAM and a few distributed RAMs via implementation on a Xilinx Kintex-7 field-programmable gate array. It uses only 3.5% of the memory resources compared with a conventional SRAM-based TCAM (hybrid-partitioned TCAM).

50 citations


Journal ArticleDOI
TL;DR: This paper proposes to partially replace DRAM using PCM to optimize the management of flash memory metadata for better system reliability in the presence of power failure and system crash, and presents a write-activity-aware PCM-assisted flash memory management scheme, called PCm-FTL.
Abstract: Phase change memory (PCM) is a promising DRAM alternative because of its non-volatility, high density, low standby power and close-to-DRAM performance. These features make PCM an attractive solution to optimize the management of NAND flash memory in embedded systems. However, PCM's limited write endurance hinders its application in embedded systems. Therefore, how to manage flash memory with PCM—particularly guarantee PCM a reasonable lifetime—becomes a challenging issue. In this paper, we propose to partially replace DRAM using PCM to optimize the management of flash memory metadata for better system reliability in the presence of power failure and system crash. To prolong PCM's lifetime, we present a write-activity-aware PCM-assisted flash memory management scheme, called PCM-FTL . By differentiating sequential and random I/O behaviors, a novel two-level mapping mechanism and a customized wear-leveling scheme are developed to reduce writes to PCM and extend its lifetime. We evaluate PCM-FTL with a variety of general-purpose and mobile I/O workloads. Experimental results show that PCM-FTL can significantly reduce write activities and achieve an even distribution of writes in PCM with very low overhead.

36 citations


Journal ArticleDOI
TL;DR: This paper proposes the smallest solution for soft-error tolerant embedded memory yet to be presented, based on a four-transistor dynamic memory core that internally stores complementary data values to provide an inherent per-bit error detection capability.
Abstract: The limited size and power budgets of space-bound systems often contradict the requirements for reliable circuit operation within high-radiation environments. In this paper, we propose the smallest solution for soft-error tolerant embedded memory yet to be presented. The proposed complementary dual-modular redundancy (CDMR) memory is based on a four-transistor dynamic memory core that internally stores complementary data values to provide an inherent per-bit error detection capability. By adding simple, low-overhead parity, an error-correction capability is added to the memory architecture for robust soft-error protection. The proposed memory was implemented in a 65-nm CMOS technology, displaying as much as a $3.5\times $ smaller silicon footprint than other radiation-hardened bitcells. In addition, the CDMR memory consumes between 48% and 87% less standby power than other considered solutions across the entire operating region.

35 citations


Journal ArticleDOI
TL;DR: This paper explores the interactions between DRAM and PCM to improve both the performance and the endurance of a DRAM-PCM hybrid main memory and develops a proactive strategy to allocate pages taking both program segments and DRAM conflict misses into consideration.
Abstract: Phase change memory (PCM), given its nonvolatility, potential high density, and low standby power, is a promising candidate to be used as main memory in next generation computer systems. However, to hide its shortcomings of limited endurance and slow write performance, state-of-the-art solutions tend to construct a dynamic RAM (DRAM)-PCM hybrid memory and place write-intensive pages in DRAM. While existing optimizations to this hybrid architecture focus on tuning DRAM configurations to reduce the number of write operations to PCM, this paper explores the interactions between DRAM and PCM to improve both the performance and the endurance of a DRAM-PCM hybrid main memory. Specifically, it exploits the flexibility of mapping virtual pages to physical pages, and develops a proactive strategy to allocate pages taking both program segments and DRAM conflict misses into consideration, thus distributing those heavily written pages across different DRAM sets. Meanwhile, a lifetime-aware DRAM replacement algorithm and a conflict-aware page remapping strategy are proposed to further reduce DRAM misses and PCM writes. Experiments confirm that the proposed techniques are able to improve average memory hit time and reduce maximum PCM write counts thus enhancing both performance and lifetime of a DRAM-PCM hybrid main memory.

32 citations


Journal ArticleDOI
TL;DR: This study aimed for reducing the power consumption and demonstrating the fully functional operation of the 64-kb Josephson-CMOS hybrid memory composed of the low-power CMOS static RAM, Josephson interface circuits, and Josephson current sensors using the Rohm 0.18 μm process and the AIST standard process.
Abstract: We have been developing a Josephson-CMOS hybrid memory with subnanosecond access time in order to overcome the memory bottleneck in single-flux-quantum digital systems. In this study, we aimed for reducing the power consumption of the 64-kb CMOS static RAM. We took three approaches, miniaturization of memory cells, improvement of data drivers, and employment of a binary-tree decoder. By using these techniques, we decreased the power consumption of 64-kb CMOS static RAMs by 54% in the write operation and by 8% in the read operation. Moreover, we aimed for demonstrating the fully functional operation of the 64-kb Josephson-CMOS hybrid memory composed of the low-power CMOS static RAM, Josephson interface circuits, and Josephson current sensors by using the Rohm 0.18 μm process and the AIST standard process 2. We confirmed the correct memory operation for arbitrary address accesses at low speed. The total access time was evaluated to be 1718 ps and the power consumption was estimated to be 27.62 mW in the write operation and 21.25 mW in the read operation in circuit simulations. Based on these estimations, we discuss the access time and the power consumption of hybrid memories using future CMOS processes.

Journal ArticleDOI
TL;DR: In this article, a dynamic random access memory composed of address decoders based on an energy-efficient rapid single-flux-quantum logic, nTron line drivers, a CMOS memory cell array, and Josephson current sensors is presented.
Abstract: We present hybridization of Josephson, CMOS, and nanocryotron (nTron) devices for a large-scale cryogenic memory application. The memory system proposed here is dynamic random access memory composed of address decoders based on an energy-efficient rapid single-flux-quantum logic, nTron line drivers, a CMOS memory cell array, and Josephson current sensors. Because drivers with voltage amplification and decoders are the major causes of power dissipation in the conventional Josephson-CMOS hybrid memory, drastic reduction in power consumption is expected. We show estimates that the power consumption of a 16-Mb memory is reduced to 1.36-2.77 mW, approximately 1/12 of the conventional Josephson-CMOS hybrid memory, and the access time is 0.78 ns for a read operation, when we assume a 65-nm CMOS process and a 1.0-μm Nb/AlO x /Nb process. In the preliminary experiment, we fabricated nTrons using NbTiN thin film that are suitable for hybrid memory implementation, and measured with eight-transistor static random access memory cells fabricated using the Rohm 0.18-μm CMOS process. We successfully triggered the nTron into the normal state, and observed output voltage of ~0.1 V at 13.5 K. The experimental results support the potential of the hybrid memory using NbTiN nTrons.

Journal ArticleDOI
TL;DR: A memristor crossbar memory architecture that utilizes a reduced constraint read-monitored-write scheme and utilizes reduced hardware, aiming to decrease the feedback complexity and latency while still operating with CMOS compatible voltages is presented.
Abstract: Memristor based crossbar memories are prime candidates to succeed the Flash as the mainstream nonvolatile memory due to their density, scalability, write endurance and capability of storing multibit per cell. In this paper, we present a memristor crossbar memory architecture that utilizes a reduced constraint read-monitored-write scheme. The proposed scheme supports multibit storage per cell and utilizes reduced hardware, aiming to decrease the feedback complexity and latency while still operating with CMOS compatible voltages. We additionally present a read technique that can successfully distinguish resistive states under the existence of resistance drift due to read/write disturbances in the array. We also provide derivations of analytical relations to set forth a design methodology in selecting peripheral device parameters.

Journal ArticleDOI
TL;DR: The 128 kb memory architecture based on RRAM technology and 28 nm fully depleted silicon on insulator (FDSOI) CMOS core process is presented with a bottom-up approach, starting from the bit-cell definition up to the complete memory architecture implementation.
Abstract: Emerging nonvolatile memories (NVM) based on resistive switching mechanism such as RRAM are under intense R&D investigation by both academics and industries. They provide high write/read speed, low power, and good endurance (e.g., >1012) beyond mainstream NVMs, enabling them to be a good candidate for Flash replacement in microcontroller unit. This replacement could significantly decrease the power consumption and the integration cost on advanced CMOS nodes. This paper presents first the HfO2-based RRAM technology and the associated compact model, which includes related physics and model card fitting experimental electrical characterizations. The 128 kb memory architecture based on RRAM technology and 28 nm fully depleted silicon on insulator (FDSOI) CMOS core process is presented with a bottom-up approach, starting from the bit-cell definition up to the complete memory architecture implementation. The key points of the architecture are the use of standard logic MOS exclusively, avoiding any high voltage MOS usage, program/verify procedure to mitigate cycle to cycle variability issue and direct bit-cell read access for characterization purpose. The proposed architecture is validated using postlayout simulations on MOS and RRAM corner cases.

Patent
Kang Kyu-Chang1, Yang Hui-Kap1
27 Jul 2017
TL;DR: In this article, the row selection circuit performs an access operation with respect to the memory bank and a hammer refresh operation on a row that is physically adjacent to a row accessed intensively.
Abstract: A memory device includes a memory bank, a row selection circuit and a refresh controller. The memory bank includes a plurality of memory blocks, and each memory block includes a plurality of memory cells arranged in rows and columns. The row selection circuit performs an access operation with respect to the memory bank and a hammer refresh operation with respect to a row that is physically adjacent to a row that is accessed intensively. The refresh controller controls the row selection circuit such that the hammer refresh operation is performed during a row active time for the access operation. The hammer refresh operation may be performed efficiently and performance of the memory device may be enhanced by performing the hammer refresh operation during the row active time for the access operation.

Journal ArticleDOI
TL;DR: A novel approach to schedule memory requests in Mixed Criticality Systems by enabling the MCS designer to specify memory requirements per task is proposed, and a compact time-division-multiplexing scheduler and framework that constructs optimal schedules to manage requests to off-chip memory are introduced.
Abstract: We propose a novel approach to schedule memory requests in Mixed Criticality Systems (MCS). This approach supports an arbitrary number of criticality levels by enabling the MCS designer to specify memory requirements per task. It retains locality within large-size requests to satisfy memory requirements of all tasks. To achieve this target, we introduce a compact time-division-multiplexing scheduler, and a framework that constructs optimal schedules to manage requests to off-chip memory. We also present a static analysis that guarantees meeting requirements of all tasks. We compare the proposed controller against state-of-the-art memory controllers using both a case study and synthetic experiments.

Journal ArticleDOI
TL;DR: A hierarchical design of 4R1W memory is introduced that requires 25% fewer BRAMs than the previous approach of duplicating the 2R1w module and can achieve higher clock frequencies by alleviating the complex routing in an FPGA.
Abstract: The utilization of block RAMs (BRAMs) is a critical performance factor for multiported memory designs on field-programmable gate arrays (FPGAs). Not only does the excessive demand on BRAMs block the usage of BRAMs from other parts of a design, but the complex routing between BRAMs and logic also limits the operating frequency. This paper first introduces a brand new perspective and a more efficient way of using a conventional two reads one write (2R1W) memory as a 2R1W/4R memory. By exploiting the 2R1W/4R as the building block, this paper introduces a hierarchical design of 4R1W memory that requires 25% fewer BRAMs than the previous approach of duplicating the 2R1W module. Memories with more read/write ports can be extended from the proposed 2R1W/4R memory and the hierarchical 4R1W memory. Compared with previous xor-based and live value table-based approaches, the proposed designs can, respectively, reduce up to 53% and 69% of BRAM usage for 4R2W memory designs with 8K-depth. For complex multiported designs, the proposed BRAM-efficient approaches can achieve higher clock frequencies by alleviating the complex routing in an FPGA. For 4R3W memory with 8K-depth, the proposed design can save 53% of BRAMs and enhance the operating frequency by 20%.

Proceedings ArticleDOI
14 Oct 2017
TL;DR: ConTutto is the first ever FPGA platform on the memory bus of a server class processor, providing a means for in-line acceleration of certain computations on-route to memory, and enables sensitivity analysis for memory latency while running real applications.
Abstract: We demonstrate the use of an FPGA as a memory buffer in a POWER8® system, creating a novel prototyping platform that enables innovation in the memory subsystem of POWER-based servers. Our platform, called ConTutto, is pin-compatible with POWER8 buffered memory DIMMs and plugs into a memory slot of a standard POWER8 processor system, running at aggregate memory channel speeds of 35 GB/s per link. ConTutto, which means “with everything”, is a platform to experiment with different memory technologies, such as STT-MRAM and NAND Flash, in an end-to-end system context. Enablement of STTMRAM and NVDIMM using ConTutto shows up to 12.5x lower latency and 7.5x higher bandwidth compared to the respective technologies when attached to the PCIe bus. Moreover, due to the unique attach-point of the FPGA between the processor and system memory, ConTutto provides a means for in-line acceleration of certain computations on-route to memory, and enables sensitivity analysis for memory latency while running real applications. To the best of our knowledge, ConTutto is the first ever FPGA platform on the memory bus of a server class processor. CCS CONCEPTS •Hardware → Emerging technologies → Analysis and design of emerging devices and systems; • Computer systems organization → Architectures → Other architectures → Reconfigurable computing;

Proceedings ArticleDOI
27 Mar 2017
TL;DR: This paper monitors the write traffic in a Programmable Logic-in-Memory (PLiM) architecture, and proposes an endurance management scheme for it, capable of handling different trade-offs between write balance, latency, and area of the resulting PLiM implementations.
Abstract: Resistive Random Access Memory (RRAM) is a promising non-volatile memory technology which enables modern in-memory computing architectures. Although RRAMs are known to be superior to conventional memories in many aspects, they suffer from a low write endurance. In this paper, we focus on balancing memory write traffic as a solution to extend the lifetime of resistive crossbar architectures. As a case study, we monitor the write traffic in a Programmable Logic-in-Memory (PLiM) architecture, and propose an endurance management scheme for it. The proposed endurance-aware compilation is capable of handling different trade-offs between write balance, latency, and area of the resulting PLiM implementations. Experimental evaluations on a set of benchmarks including large arithmetic and control functions show that the standard deviation of writes can be reduced by 86.65% on average compared to a naive compiler, while the average number of instructions and RRAM devices also decreases by 36.45% and 13.67%, respectively.

Proceedings ArticleDOI
14 Mar 2017
TL;DR: This paper proposes V-ReRAM, a novel ReRAM crossbar design based on 1TnR cell structure that greatly reduces the number of half-selected cells and thus the sneak leakage and improves RESET performance by exploiting the RESET latency difference among memory cells in Re RAM crossbars.
Abstract: ReRAM (Resistive Random Access Memory) is an emerging non-volatile memory technology that exhibits high cell density and low standby power. ReRAM crossbars, while having the smallest 4F2 cell size, suffer from large sneak leakage, which not only wastes dynamic energy but also degrades system performance significantly. In this paper, we propose V-ReRAM, a novel ReRAM crossbar design based on 1TnR cell structure. By reorganizing the peripheral circuit, V-ReRAM greatly reduces the number of half-selected cells and thus the sneak leakage. V-ReRAM further improves RESET performance by exploiting the RESET latency difference among memory cells in ReRAM crossbars. Our experimental results show that, on average, V-ReRAM improves the system performance by 7.3% and reduces memory energy consumption by 72%, comparing to the baseline 1T4R based ReRAM crossbar.

Proceedings ArticleDOI
21 Sep 2017
TL;DR: This paper enhances the state-of-the-art by systematically exploiting the implicit refresh of memory access for relaxing the refresh rate, while minimizing the resulting memory errors by modifying the algorithmic parameters that influence the access patterns.
Abstract: The main memory in today's systems is based on DRAMs, which may offer low cost and high density storage for large amounts of data but it comes with a main drawback; DRAM cells need to be refreshed frequently for retaining the stored data. The refresh rate in modern DRAMs is set based on the worst-case retention time without considering access statistics, thereby resulting in very frequent refresh operations. Such high refresh rate leads eventually to large power and performance overheads, which are increasing with higher DRAM densities. However, such high refresh rates may not even required due to extremely low probability of the actual occurrence of the assumed worst-case scenarios, or due to the implicit refresh operation that occur during every memory access, a feature that has not been yet been studied in depth. In this paper, we enhance the state-of-the-art by systematically exploiting the implicit refresh of memory access for relaxing the refresh rate, while minimizing the resulting memory errors. This is achieved by modifying the algorithmic parameters that influence the access patterns such that all stored data are being touched within a target time interval that is necessary for meeting a target error rate. The proposed method is applied to stencil-based algorithms which represent a wide class of algorithms used in numerical analysis, image processing and cellular automata applications. The efficacy of the proposed method is demonstrated on an off-the-shelf server running a fully fledged Linux OS and results show that it is even possible to completely disable DRAM refresh with minor quality loss.

Patent
12 Jan 2017
TL;DR: In this paper, an endurance parameter value of a nonvolatile memory included in a non-volatile dual in-line memory module (NVDIMM) can be monitored and compared against a warning threshold value.
Abstract: An endurance parameter value of a non-volatile memory included in a non-volatile dual in-line memory module (NVDIMM) can be monitored and compared against a warning threshold value. In response to the endurance parameter exceeding the warning threshold value, a system alert can be generated, within a host system of the NVDIMM, to inform a system user that the NVDIMM is approaching its end-of-life. If the endurance parameter exceeds a replacement threshold value greater than the warning threshold value, an upgrade process can be initiated. The upgrade process can include copying data from the first non-volatile memory to a volatile memory of the NVDIMM and copying, in response to the first non-volatile memory being replaced with a second non-volatile memory, the data from the volatile memory to the second non-volatile memory.

Proceedings ArticleDOI
18 Jun 2017
TL;DR: This work proposes a persistent memory accelerator design, which guarantees NVRAM data persistence by hardware yet leaving cache hierarchy and memory controller operations unaltered, and achieves the performance close to the one without persistence guarantee.
Abstract: Persistent memory places NVRAM on the memory bus, offering fast access to persistent data. Yet maintaining NVRAM data persistence raises a host of challenges. Most proposed schemes either incur much performance overhead or require substantial modifications to existing architectures. We propose a persistent memory accelerator design, which guarantees NVRAM data persistence by hardware yet leaving cache hierarchy and memory controller operations unaltered. A nonvolatile transaction cache keeps an alternative version of data updates side-by-side with the cache hierarchy and paves a new persistent path without affecting original processor execution path. As a result, our design achieves the performance close to the one without persistence guarantee.

Patent
19 Jan 2017
TL;DR: A semiconductor memory as discussed by the authors includes a first memory cell including a first transistor, a second memory cell, including a second transistor, and a memory peripherals transistor overlaying the second transistor or underneath the first transistor.
Abstract: A semiconductor memory, including: a first memory cell including a first transistor; a second memory cell including a second transistor; and a memory peripherals transistor overlaying the second transistor or underneath the first transistor, where the second memory cell overlays the first memory cell, and where the first memory cell and the second memory cell have both been processed following a lithography step and accordingly are precisely aligned, and where the memory peripherals transistor is part of a peripherals circuit controlling the memory.

Proceedings ArticleDOI
18 Jun 2017
TL;DR: An age-aware placement framework for RRAM-FPGAs with uniform reconfigurable logic/memory units, consisting of a dynamic reconfiguration region allocation algorithm and a logic/ memory co-placement algorithm, that balances write distributions across the entire FPGA according to logic and memory write frequency differences.
Abstract: Resistive RAM (RRAM) is a promising non-volatile memory (NVM) device which can replace traditional SRAM as on-chip storage for logic and data in FPGAs. While RRAM outperforms SRAM by offering high scalability, low leakage power, and near-zero power-on delay, RRAM-FPGAs have limited programming cycles, and different writes frequencies of memory and logic blocks make the challenge more severe. To overcome this endurance challenge, we propose an age-aware placement framework for RRAM-FPGAs with uniform reconfigurable logic/memory units. The framework, consisting of a dynamic reconfiguration region allocation algorithm and a logic/memory co-placement algorithm, balances write distributions across the entire FPGA according to logic and memory write frequency differences. The proposed algorithms have been integrated into the VTR synthesis flow. Experiments show that the framework achieves 94.9% write reduction, thus effectively extending RRAM-FPGA programming cycles.

Patent
Choi Wonjun1, Yang Hui-Kap1
18 May 2017
TL;DR: In this paper, a row selection circuit performs the access operation and the refresh operation with respect to the memory bank, while the collision controller generates a wait signal causing a delay of the access operations based on a result of a comparison of a row address associated with an access operation with a refresh operation.
Abstract: A memory device includes a memory bank, a command control logic circuit, a row selection circuit, a refresh controller and a collision controller The memory bank includes a plurality of memory blocks The command control logic circuit decodes commands received from a memory controller to generate control signals The command control logic receives an active command for an access operation during a refresh operation The row selection circuit performs the access operation and the refresh operation with respect to the memory bank The refresh controller controls the refresh operation The collision controller generates a wait signal causing a delay of the access operation based on a result of a comparison of a row address associated with the access operation and a refresh address associated with the refresh operation

Proceedings ArticleDOI
27 Mar 2017
TL;DR: This work proposes Counter OVErflow ReducTion (COVERT), a CME-based memory encryption solution that performs on-demand memory allocation to reduce the memory encryption frequency of fast growing counters, while also retaining the area/performance benefits of small-sized counters.
Abstract: Security vulnerabilities arising from data persistence in emerging non-volatile memories (NVMs) necessitate memory encryption to ensure data security. Whereas counter mode encryption (CME) is a stop-gap practical approach to address this concern, it suffers from frequent memory re-encryption (system freeze) for small-sized counters and poor system performance for large-sized counters. CME thus imposes heavy overheads on memory, system performance, and system availability in practice. We propose Counter OVErflow ReducTion (COVERT), a CME-based memory encryption solution that performs on-demand memory allocation to reduce the memory encryption frequency of fast growing counters, while also retaining the area/performance benefits of small-sized counters. Our full-system simulations of a phase change memory (PCM) architecture across SPEC CPU2006 benchmarks show that for equivalent overhead and no impact to performance, COVERT simultaneously reduces the full memory re-encryption frequency from 6 minutes to 25 hours and doubles memory lifetime in comparison to state-of-the-art CME techniques.

Journal ArticleDOI
TL;DR: This paper quantitatively evaluated the data storage density of the MLC STT-RAM and proposed a new design through a cross-layer co-optimization that can improve the system performance and reduce the energy consumption on cache.
Abstract: Spin-transfer torque random access memory (STT-RAM), as an emerging nonvolatile memory technology, provides very dense array structure and extremely low leakage power consumption. It demonstrates a great potential in replacing conventional static random access memory technology to develop the next-generation on-chip cache memory of microprocessors and graphics processing units. The multilevel cell (MLC) design of STT-RAM that stores two or more bits in one cell potentially has higher storage capacity and faster system performance, attracting significant attention. In this paper, we first quantitatively evaluated the data storage density of the MLC STT-RAM. Our results revealed limited density improvement because of the large size of access transistor induced by high write current amplitude requirement and asymmetry of switching behavior. Moreover, the read and write accesses of existing MLC STT-RAM cache designs require two-step operation. The system level evaluation shows that the long access latency could amortize the performance speed brought by larger cache size, and even degrade the system performance for some applications. To unleash the potential of MLC STT-RAM cache, we proposed a new design through a cross-layer co-optimization. The memory cell structure integrated the reversed stacking of magnetic junction tunneling for a more balanced device and design tradeoff. In architecture development, we presented an adaptive mode switching mechanism: based on application’s memory access behavior, the MLC STT-RAM cache can dynamically change between low latency single-level cell mode and high capacity MLC mode. Furthermore, we divided cache lines into fast and slow regions and investigated new data migration policies to allocate frequently access data to fast regions. Simulation results show that the proposed techniques can improve the system performance by 10.2% and reduce the energy consumption on cache by 9.5% compared with conventional MLC STT-RAM cache design.

Journal ArticleDOI
TL;DR: In this article, a content addressable memory (CAM) cell was proposed, which utilizes a phase change memory (PCM) as a storage element and an ambipolar transistor for data comparison.
Abstract: This paper presents the design of a content addressable memory (CAM) cell. This cell utilizes a phase change memory (PCM) as a storage element and an ambipolar transistor for data comparison; the operation of the ambipolar transistor is controlled by voltage at the polarity gate. A memory core consisting of a CMOS transistor and a PCM is employed (1T1P). For the search operation, the data in the 1T1P memory core are read and its values are established by using a differential sense amplifier. The proposed CAM cell is simulated and compared with other nonvolatile CAM cells by using emerging technologies (such as MTJ and memristor). The simulation results show that as the proposed CAM cell operates on a voltage basis, it offers significant advantages in terms of power delay product for the search operation and reduced circuit complexity (in terms of lower transistor and storage element counts) compared with other designs found in the technical literature.

Patent
06 Mar 2017
TL;DR: In this paper, a read sensing method for an OTP non-volatile memory is provided, where the memory array is connected with plural bit lines, and a selected memory cell is determined, wherein the selected memory cells are connected with a first bit line of the plural bits lines.
Abstract: A read sensing method for an OTP non-volatile memory is provided. The memory array is connected with plural bit lines. The read sensing method includes following steps. Firstly, the plural bit lines are precharged to a precharge voltage. Then, a selected memory cell of the memory array is determined, wherein the selected memory cell is connected with a first bit line of the plural bit lines. Then, the bit line corresponding to the selected memory cell is connected with the data line, and the data line is discharged to a reset voltage. After a cell current from the selected memory cell is received, a voltage level of the data line is gradually changed from the reset voltage. According to a result of comparing a voltage level of the data line with a comparing voltage, an output signal is generated.